Skip to content

fix(engine) prevent MVCC race in blockRunWithWaitpoint pending check#3075

Open
ericallam wants to merge 2 commits intomainfrom
fix/waitpoint-mvcc-race
Open

fix(engine) prevent MVCC race in blockRunWithWaitpoint pending check#3075
ericallam wants to merge 2 commits intomainfrom
fix/waitpoint-mvcc-race

Conversation

@ericallam
Copy link
Member

Split the CTE in blockRunWithWaitpoint so the pending waitpoint check
is a separate SQL statement. In READ COMMITTED isolation, each statement
gets its own snapshot, so a separate SELECT sees the latest committed
state from concurrent completeWaitpoint calls.

Previously, the CTE did INSERT + pending check in one statement (one
snapshot). If completeWaitpoint committed between the CTE start and
the SELECT, the SELECT would still see PENDING due to the stale
snapshot. Neither side would enqueue continueRunIfUnblocked, leaving
the run stuck forever.

@changeset-bot
Copy link

changeset-bot bot commented Feb 17, 2026

⚠️ No Changeset found

Latest commit: 86190be

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 17, 2026

Walkthrough

The change replaces the prior single atomic insert-and-count with a two-step flow: (1) insert blocking connections via a CTE, returning inserted rows; (2) run a separate pendingCheck query that counts pending waitpoints using a fresh MVCC snapshot. The code stops using the insert’s returned pending_count and derives isRunBlocked from the separate pendingCheck result. Extensive comments were added explaining snapshot isolation and why separate statements are required; control flow and scheduling decisions now rely on the independent pending-check outcome.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description explains the technical problem and solution clearly, but the provided description template requires several sections (Checklist, Testing, Changelog, Screenshots) that are missing or incomplete. Add the missing checklist items, testing steps, and changelog section to match the repository's required PR description template format.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: fixing an MVCC race condition in the blockRunWithWaitpoint pending check by separating the CTE into distinct SQL statements.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/waitpoint-mvcc-race

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts (1)

430-442: Excellent fix for the MVCC snapshot race — minor type nit on BigInt.

The separation into a distinct SQL statement is the correct approach under READ COMMITTED isolation, and the explanatory comment is thorough. One small type issue:

Prisma's $queryRaw returns PostgreSQL bigint (from COUNT(*)) as the JavaScript primitive bigint, not the wrapper object BigInt. This won't cause a runtime error since Number() handles both, but the type annotation is technically inaccurate.

Nit: use lowercase bigint for the primitive type
-      const pendingCheck = await prisma.$queryRaw<{ pending_count: BigInt }[]>`
+      const pendingCheck = await prisma.$queryRaw<{ pending_count: bigint }[]>`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal-packages/run-engine/src/engine/systems/waitpointSystem.ts` around
lines 430 - 442, The type annotation on the result of prisma.$queryRaw is using
the wrapper BigInt instead of the JS primitive bigint; update the generic on
prisma.$queryRaw from { pending_count: BigInt }[] to { pending_count: bigint }[]
so the type accurately reflects PostgreSQL COUNT(*) returning a primitive
bigint, leaving the surrounding logic (pendingCheck,
Number(pendingCheck.at(0)?.pending_count ?? 0), Prisma.join($waitpoints))
unchanged.
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7af789b and c603987.

📒 Files selected for processing (1)
  • internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

**/*.{ts,tsx}: Always import tasks from @trigger.dev/sdk, never use @trigger.dev/sdk/v3 or deprecated client.defineJob pattern
Every Trigger.dev task must be exported and have a unique id property with no timeouts in the run function

Files:

  • internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Import from @trigger.dev/core using subpaths only, never import from root

Files:

  • internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier before committing

Files:

  • internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
🧠 Learnings (1)
📚 Learning: 2025-07-12T18:06:04.133Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2264
File: apps/webapp/app/services/runsRepository.server.ts:172-174
Timestamp: 2025-07-12T18:06:04.133Z
Learning: In apps/webapp/app/services/runsRepository.server.ts, the in-memory status filtering after fetching runs from Prisma is intentionally used as a workaround for ClickHouse data delays. This approach is acceptable because the result set is limited to a maximum of 100 runs due to pagination, making the performance impact negligible.

Applied to files:

  • internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
  • GitHub Check: sdk-compat / Bun Runtime
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
  • GitHub Check: sdk-compat / Deno Runtime
  • GitHub Check: sdk-compat / Cloudflare Workers
🔇 Additional comments (2)
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts (2)

400-428: Clean separation of the INSERT CTE from the pending check.

The CTE now correctly focuses solely on inserting the blocking associations, with its result intentionally discarded since the pending count is determined by the separate query below. The inline comments clearly explain why the SELECT COUNT(*) is needed (to force PostgreSQL to execute the inserted CTE branch). Looks good.


493-504: Debounce-based safety net is sound.

Even if both blockRunWithWaitpoint (seeing 0 pending) and a concurrent completeWaitpoint both enqueue continueRunIfUnblocked for the same run, the shared job ID continueRunIfUnblocked:${runId} ensures the work is debounced. The two-statement approach combined with this debounce makes the race harmless from both directions.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@internal-packages/run-engine/src/engine/systems/waitpointSystem.ts`:
- Around line 430-442: The type annotation on the result of prisma.$queryRaw is
using the wrapper BigInt instead of the JS primitive bigint; update the generic
on prisma.$queryRaw from { pending_count: BigInt }[] to { pending_count: bigint
}[] so the type accurately reflects PostgreSQL COUNT(*) returning a primitive
bigint, leaving the surrounding logic (pendingCheck,
Number(pendingCheck.at(0)?.pending_count ?? 0), Prisma.join($waitpoints))
unchanged.

Copy link

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

Split the CTE in blockRunWithWaitpoint so the pending waitpoint check
is a separate SQL statement. In READ COMMITTED isolation, each statement
gets its own snapshot, so a separate SELECT sees the latest committed
state from concurrent completeWaitpoint calls.

Previously, the CTE did INSERT + pending check in one statement (one
snapshot). If completeWaitpoint committed between the CTE start and
the SELECT, the SELECT would still see PENDING due to the stale
snapshot. Neither side would enqueue continueRunIfUnblocked, leaving
the run stuck forever.
@ericallam ericallam force-pushed the fix/waitpoint-mvcc-race branch from c603987 to 64ddf12 Compare February 17, 2026 12:22
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts (1)

435-435: Nit: use lowercase bigint for the TypeScript primitive type.

BigInt (uppercase) refers to the BigInt constructor/interface, not the primitive type. Prisma's $queryRaw returns the bigint primitive for PostgreSQL COUNT(*) results.

Proposed fix
-      const pendingCheck = await prisma.$queryRaw<{ pending_count: BigInt }[]>`
+      const pendingCheck = await prisma.$queryRaw<{ pending_count: bigint }[]>`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal-packages/run-engine/src/engine/systems/waitpointSystem.ts` at line
435, The type for the raw query result uses the wrong BigInt identifier; update
the type annotation on the prisma.$queryRaw call (the pendingCheck declaration)
to use the TypeScript primitive `bigint` instead of the `BigInt`
constructor/interface so the inferred type matches PostgreSQL COUNT(*) results
(e.g., change `{ pending_count: BigInt }[]` to `{ pending_count: bigint }[]` for
pendingCheck).
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c603987 and 64ddf12.

📒 Files selected for processing (1)
  • internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

**/*.{ts,tsx}: Always import tasks from @trigger.dev/sdk, never use @trigger.dev/sdk/v3 or deprecated client.defineJob pattern
Every Trigger.dev task must be exported and have a unique id property with no timeouts in the run function

Files:

  • internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Import from @trigger.dev/core using subpaths only, never import from root

Files:

  • internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier before committing

Files:

  • internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
🧠 Learnings (1)
📚 Learning: 2025-07-12T18:06:04.133Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2264
File: apps/webapp/app/services/runsRepository.server.ts:172-174
Timestamp: 2025-07-12T18:06:04.133Z
Learning: In apps/webapp/app/services/runsRepository.server.ts, the in-memory status filtering after fetching runs from Prisma is intentionally used as a workaround for ClickHouse data delays. This approach is acceptable because the result set is limited to a maximum of 100 runs due to pagination, making the performance impact negligible.

Applied to files:

  • internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: sdk-compat / Cloudflare Workers
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: sdk-compat / Deno Runtime
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: sdk-compat / Bun Runtime
  • GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (1)
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts (1)

400-442: Well-designed fix for the MVCC snapshot race condition.

The separation of the INSERT CTE from the pending-status check into distinct SQL statements is the correct approach under PostgreSQL's READ COMMITTED isolation. The inline comments clearly explain the rationale. 👍

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@internal-packages/run-engine/src/engine/systems/waitpointSystem.ts`:
- Line 435: The type for the raw query result uses the wrong BigInt identifier;
update the type annotation on the prisma.$queryRaw call (the pendingCheck
declaration) to use the TypeScript primitive `bigint` instead of the `BigInt`
constructor/interface so the inferred type matches PostgreSQL COUNT(*) results
(e.g., change `{ pending_count: BigInt }[]` to `{ pending_count: bigint }[]` for
pendingCheck).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts (1)

451-458: Nit: BigInt (wrapper object) vs bigint (primitive) in the type annotation.

Prisma's $queryRaw returns the JavaScript bigint primitive for PostgreSQL bigint/int8 columns. The type annotation on Line 451 uses BigInt (capital B), which is the wrapper object type. This doesn't cause a runtime issue (TypeScript types are erased), but it's technically the wrong type.

Proposed fix
-      const pendingCheck = await prisma.$queryRaw<{ pending_count: BigInt }[]>`
+      const pendingCheck = await prisma.$queryRaw<{ pending_count: bigint }[]>`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal-packages/run-engine/src/engine/systems/waitpointSystem.ts` around
lines 451 - 458, The type annotation for the result of prisma.$queryRaw uses the
wrapper object `BigInt` instead of the primitive `bigint`; update the generic on
prisma.$queryRaw to use `{ pending_count: bigint }[]` (referencing the
`pendingCheck` variable and the `prisma.$queryRaw` call) so the `pending_count`
property has the correct primitive type and subsequent
`Number(pendingCheck.at(0)?.pending_count ?? 0)` usage (used to compute
`isRunBlocked`) aligns with the actual returned type.
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 64ddf12 and 86190be.

📒 Files selected for processing (1)
  • internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

**/*.{ts,tsx}: Always import tasks from @trigger.dev/sdk, never use @trigger.dev/sdk/v3 or deprecated client.defineJob pattern
Every Trigger.dev task must be exported and have a unique id property with no timeouts in the run function

Files:

  • internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Import from @trigger.dev/core using subpaths only, never import from root

Files:

  • internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier before committing

Files:

  • internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
🧠 Learnings (1)
📚 Learning: 2025-07-12T18:06:04.133Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2264
File: apps/webapp/app/services/runsRepository.server.ts:172-174
Timestamp: 2025-07-12T18:06:04.133Z
Learning: In apps/webapp/app/services/runsRepository.server.ts, the in-memory status filtering after fetching runs from Prisma is intentionally used as a workaround for ClickHouse data delays. This approach is acceptable because the result set is limited to a maximum of 100 runs due to pagination, making the performance impact negligible.

Applied to files:

  • internal-packages/run-engine/src/engine/systems/waitpointSystem.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
🔇 Additional comments (3)
internal-packages/run-engine/src/engine/systems/waitpointSystem.ts (3)

367-383: Well-documented MVCC rationale — this is excellent.

The comment thoroughly explains the snapshot isolation semantics, why the two statements must be separate, and why the pending check covers all requested waitpoint IDs (not just newly inserted ones). This is exactly the kind of documentation that prevents future maintainers from "optimizing" the two statements back into one.


416-444: CTE restructuring looks correct.

The CTE now only handles the INSERT side effects, and the outer SELECT COUNT(*) FROM inserted satisfies the syntactic requirement for a final SELECT while keeping the data-modifying CTEs executable. Discarding the result is intentional since the blocking decision is now made by the separate query.


509-520: Blocking logic correctly derived from the fresh pending check — LGTM.

The isRunBlocked flag now reflects the latest committed state, ensuring that if a concurrent completeWaitpoint committed between the CTE and this check, the run won't get stuck. The debounced continueRunIfUnblocked enqueue is safe even if both this path and completeWaitpoint fire it (idempotent job ID).

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@internal-packages/run-engine/src/engine/systems/waitpointSystem.ts`:
- Around line 451-458: The type annotation for the result of prisma.$queryRaw
uses the wrapper object `BigInt` instead of the primitive `bigint`; update the
generic on prisma.$queryRaw to use `{ pending_count: bigint }[]` (referencing
the `pendingCheck` variable and the `prisma.$queryRaw` call) so the
`pending_count` property has the correct primitive type and subsequent
`Number(pendingCheck.at(0)?.pending_count ?? 0)` usage (used to compute
`isRunBlocked`) aligns with the actual returned type.

@ericallam ericallam marked this pull request as ready for review February 17, 2026 12:58
Copy link

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 3 additional findings in Devin Review.

Open in Devin Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants