-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Problem
When a synchronize (push) event triggers the doc-check workflow for a PR that already has an existing task, the action finds the task via the list endpoint and enters waitForTaskActive to poll until it's active + idle. However, if a concurrent workflow run (from a previous push) is still executing its cleanup phase, it can delete the task between the initial list call and a subsequent getTaskById poll.
This results in an unhandled 404 error that crashes the action:
Coder API error: Not Found
statusCode: 404
response: '{"message":"Resource not found or you do not have access to this resource"}'
Stack trace:
at RealCoderClient.request (dist/index.js:26727:13)
at async RealCoderClient.getTaskById (dist/index.js:26780:22)
at async RealCoderClient.waitForTaskActive (dist/index.js:26802:20)
at async CoderTaskAction.run (dist/index.js:27001:7)
Example failure: https://github.com/coder/coder/actions/runs/23586039175/job/68679282993#step:7:65
Root Cause
waitForTaskActive polls getTaskById in a loop but does not handle the case where the task disappears (404) during polling. This is a race condition between concurrent workflow runs for the same PR:
- Run A finds existing task
doc-check-{N}, enterswaitForTaskActive - Run B (previous run) reaches its
always()cleanup step and deletesdoc-check-{N} - Run A's next
getTaskByIdpoll returns 404, action crashes
Suggested Fix
In waitForTaskActive, catch 404 errors from getTaskById and either:
- Treat it as a signal to re-create the task (fall through to the create path)
- Retry with a fresh task lookup via the list endpoint
- Return a specific error that the caller (
action.ts) can handle to create a new task
Additionally, consider using GitHub Actions' concurrency groups to prevent overlapping runs for the same PR, which would avoid the race condition entirely.
Created on behalf of @johnstcn