Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .github/actions/issue-auto-implement/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,14 @@ If a reviewer's comment is ambiguous, assess might still return **implement** (o

- **Path:** `assess/src/index.ts` (TypeScript), run with `npx tsx src/index.ts` from the assess directory (no build).
- **Input:** Reads event from `GITHUB_EVENT_PATH`; optional context files from input.
- **Output:** JSON with `action` (`implement` | `request_info`), `comment_body` (if request_info), `verification_notes` (optional). Written to file or GITHUB_OUTPUT.
- **When triggered by PR review:** Include PR review body and review comments in the payload sent to Claude.
- **Output:** JSON with `action` (`implement` | `request_info`), `comment_body` (if request_info), `verification_notes` (optional), `review_feedback` (when trigger is PR review or comment on PR — exact text for implement to address). Written to file or GITHUB_OUTPUT.
- **When triggered by PR review:** Include PR review body and review comments in the payload sent to Claude. Assess also sets `review_feedback` from the review/comment so the implement step receives it.

## Implement script

- **Path:** `assess/src/implement.ts`, run with `npx tsx src/implement.ts` from the assess directory.
- **Env:** `ISSUE_NUMBER`, `GITHUB_REPOSITORY`, `GITHUB_TOKEN`, `AUTO_IMPLEMENT_ANTHROPIC_API_KEY` (required); `VERIFICATION_NOTES`, `GITHUB_WORKSPACE`, `CONTEXT_FILES`, `IMPLEMENT_COMMIT_MSG_FILE`, `PREVIOUS_VERIFY_OUTPUT` (optional).
- **Flow:** Fetches issue title/body from GitHub API, loads context files, then runs **Claude Code CLI** (`claude` on PATH) in the repo root with a prompt; the script passes `AUTO_IMPLEMENT_ANTHROPIC_API_KEY` to the CLI as `ANTHROPIC_API_KEY`. The CLI implements in-repo (Read/Edit/Bash). When Claude makes code changes it writes `.commit_msg`, `.pr_title`, `.pr_body`; when it makes no code changes it writes `.comment_body` (no-change rationale or request for clarification). The script ensures commit/PR meta files exist only when Claude did not write `.comment_body`. When `PREVIOUS_VERIFY_OUTPUT` is set (e.g. after a failed verify run), it is included in the prompt so the CLI can fix the implementation. In CI, the action installs the CLI (`npm install -g @anthropic-ai/claude-code`) before the implement step when the assess outcome is `implement`.
- **Env:** `ISSUE_NUMBER`, `GITHUB_REPOSITORY`, `GITHUB_TOKEN`, `AUTO_IMPLEMENT_ANTHROPIC_API_KEY` (required); `VERIFICATION_NOTES`, `REVIEW_FEEDBACK`, `GITHUB_WORKSPACE`, `CONTEXT_FILES`, `IMPLEMENT_COMMIT_MSG_FILE`, `PREVIOUS_VERIFY_OUTPUT` (optional).
- **Flow:** Fetches issue title/body from GitHub API, loads context files, then runs **Claude Code CLI** (`claude` on PATH) in the repo root with a prompt; the script passes `AUTO_IMPLEMENT_ANTHROPIC_API_KEY` to the CLI as `ANTHROPIC_API_KEY`. When `REVIEW_FEEDBACK` is set (PR review or comment iteration), the prompt includes a "Review feedback to address (you must implement this)" section so the CLI addresses the reviewer's ask (e.g. add tests). The CLI implements in-repo (Read/Edit/Bash). When Claude makes code changes it writes `.commit_msg`, `.pr_title`, `.pr_body`; when it makes no code changes it writes `.comment_body` (no-change rationale or request for clarification). The script ensures commit/PR meta files exist only when Claude did not write `.comment_body`. When `PREVIOUS_VERIFY_OUTPUT` is set (e.g. after a failed verify run), it is included in the prompt so the CLI can fix the implementation. In CI, the action installs the CLI (`npm install -g @anthropic-ai/claude-code`) before the implement step when the assess outcome is `implement`.

## Implement–verify loop

Expand Down
45 changes: 45 additions & 0 deletions .github/actions/issue-auto-implement/DIAGNOSIS-PR-261.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Diagnosis: PR #261 — Bot replied "Verification passed" instead of adding tests

## What happened

- **PR #261** added `--local` to `login` and `ci` (auto-implement from issue #215).
- **Reviewer comment:** "Could we add some unit and acceptance test coverage for this change?"
- **Bot response:** "The --local flag has already been implemented... Verification passed." — no new tests were added.

## Root cause: review feedback never reaches the implement step

The pipeline has two steps that use Claude:

1. **Assess** (`assess/src/index.ts`) — decides `implement` vs `request_info`, outputs `verification_notes`.
2. **Implement** (`assess/src/implement.ts`) — runs Claude Code CLI in the repo to make changes.

**Assess** receives the full event payload. For a comment on the PR (`issue_comment` on the PR, or `pull_request_review`), the prompt includes:
- Issue (for a PR comment: the PR title and body),
- "All issue comments" (PR comments from the API),
- "Latest event comment" or "PR review (address this feedback):" with the reviewer's text.

So assess **does** see "Could we add some unit and acceptance test coverage for this change?" and correctly returns `action: implement`.

**Implement** only receives:
- `ISSUE_NUMBER` (215),
- Issue title/body **fetched from the GitHub API for issue #215** (the original issue about `--local`),
- `VERIFICATION_NOTES` (from assess: generic "run test suite" style notes),
- Context files, `PREVIOUS_VERIFY_OUTPUT`.

So implement **never** sees the PR comment or review body. The Claude Code CLI prompt is built from the **original issue** only. It has no instruction to "add unit and acceptance tests" — so it reasonably treats the issue as already done and writes `.comment_body` ("Verification passed") instead of making code changes.

## Why it wasn't "enough coverage"

The bot didn't decide there was "enough coverage." It never had the reviewer's ask in the implement prompt. So it didn't consider coverage at all; it only had the original issue text.

## Design gap

AGENTS.md says: *"assess with issue + review/comment content → implement ('address review feedback')"*. The **assess** step does get review/comment content, but that content is **not** passed through to **implement**. So "address review feedback" is not possible with the current wiring.

## Fix (implemented in this worktree)

1. **Assess** — When the trigger is PR review or a comment on a PR, derive the review/comment text from the payload and add it to the assessment JSON output as `review_feedback` (so the workflow can pass it on).
2. **Action** — Expose `review_feedback` from the assess step and pass it to the implement step as `REVIEW_FEEDBACK`.
3. **Implement** — Read `REVIEW_FEEDBACK`; when set, add a clear section to the Claude Code CLI prompt: **"Review feedback to address (you must implement this):"** so the CLI actually implements the reviewer's request (e.g. add unit and acceptance tests).

After this change, a comment like "Could we add some unit and acceptance test coverage for this change?" will be passed into the implement prompt, and the implement step will be instructed to address it, so the bot should add tests instead of replying with "Verification passed."
5 changes: 3 additions & 2 deletions .github/actions/issue-auto-implement/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ Scripts load a **local `.env`** file so you don't have to pass secrets on the co
| `GITHUB_WORKSPACE` | Optional | Repo root; default inferred from cwd when run from `assess/`. |
| `CONTEXT_FILES` | Optional | Comma-separated paths (relative to repo root) for Claude context. |
| `VERIFICATION_NOTES` | Optional (Implement) | Notes from assess step. |
| `REVIEW_FEEDBACK` | Optional (Implement) | When the run was triggered by a PR review or comment on a PR, the action passes the review/comment text so the implement step can address it (e.g. "add unit and acceptance tests"). |
| `PREVIOUS_VERIFY_OUTPUT` | Optional (Implement retries) | Previous verify failure output. |

### One-time setup: `.env`
Expand All @@ -118,7 +119,7 @@ cd .github/actions/issue-auto-implement/assess
npm run assess:fixture
```

With `.env` in place, no need to pass the key on the command line. Optional: set `GITHUB_TOKEN` and `GITHUB_REPOSITORY` to exercise redirect-to-PR and fetch-comments. Set `ASSESS_DEBUG=1` to log the prompt sent to Claude and the raw response to stderr. Other fixtures: `GITHUB_EVENT_PATH=./test/fixtures/issue-comment.json GITHUB_EVENT_NAME=issue_comment npx tsx src/index.ts`.
With `.env` in place, no need to pass the key on the command line. Optional: set `GITHUB_TOKEN` and `GITHUB_REPOSITORY` to exercise redirect-to-PR and fetch-comments. **Logging:** The script always logs Claude's raw response to stderr (the JSON decision and any surrounding text) and a one-line summary (e.g. `Assess: action=implement issue_number=215 review_feedback=52 chars`), so CI logs show what was decided and what the model returned. Set `ASSESS_DEBUG=1` to also log the full prompt (issue, comments, context files) when debugging. Other fixtures: `GITHUB_EVENT_PATH=./test/fixtures/issue-comment.json GITHUB_EVENT_NAME=issue_comment npx tsx src/index.ts`.

### Implement (issue → Claude Code CLI → files on disk)

Expand All @@ -129,7 +130,7 @@ cd .github/actions/issue-auto-implement/assess
npm run implement:issue
```

With `.env` set (e.g. `ISSUE_NUMBER`, `GITHUB_REPOSITORY`, `GITHUB_TOKEN`, `AUTO_IMPLEMENT_ANTHROPIC_API_KEY`), no need to pass them inline. Override any var on the command line if needed (e.g. `ISSUE_NUMBER=42 npm run implement:issue`). Then from the repo root inspect `git status` and the commit message at `.github/actions/issue-auto-implement/.commit_msg`. Optionally set `VERIFICATION_NOTES` and `CONTEXT_FILES`.
With `.env` set (e.g. `ISSUE_NUMBER`, `GITHUB_REPOSITORY`, `GITHUB_TOKEN`, `AUTO_IMPLEMENT_ANTHROPIC_API_KEY`), no need to pass them inline. Override any var on the command line if needed (e.g. `ISSUE_NUMBER=42 npm run implement:issue`). Then from the repo root inspect `git status` and the commit message at `.github/actions/issue-auto-implement/.commit_msg`. Optionally set `VERIFICATION_NOTES`, `REVIEW_FEEDBACK` (reviewer or PR comment text to address), and `CONTEXT_FILES`.

For implementation details and verification steps, see `AGENTS.md`.

Expand Down
4 changes: 4 additions & 0 deletions .github/actions/issue-auto-implement/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,9 @@ runs:
echo "verification_notes<<NOTES_EOF" >> $GITHUB_OUTPUT
echo "$RESULT" | jq -r '.verification_notes // empty' >> $GITHUB_OUTPUT
echo "NOTES_EOF" >> $GITHUB_OUTPUT
echo "review_feedback<<REVIEW_EOF" >> $GITHUB_OUTPUT
echo "$RESULT" | jq -r '.review_feedback // empty' >> $GITHUB_OUTPUT
echo "REVIEW_EOF" >> $GITHUB_OUTPUT
- name: Handle redirect to PR
if: steps.assess.outputs.action == 'redirect_to_pr'
shell: bash
Expand Down Expand Up @@ -204,6 +207,7 @@ runs:
GITHUB_WORKSPACE: ${{ github.workspace }}
AUTO_IMPLEMENT_ANTHROPIC_API_KEY: ${{ inputs.anthropic_api_key }}
VERIFICATION_NOTES: ${{ steps.assess.outputs.verification_notes }}
REVIEW_FEEDBACK: ${{ steps.assess.outputs.review_feedback }}
CONTEXT_FILES: ${{ inputs.context_files }}
IMPLEMENT_COMMIT_MSG_FILE: ${{ github.workspace }}/.github/actions/issue-auto-implement/.commit_msg
MAX_RETRIES: ${{ inputs.max_implement_retries }}
Expand Down
23 changes: 21 additions & 2 deletions .github/actions/issue-auto-implement/assess/src/implement.ts
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,8 @@ function buildClaudeCliPrompt(
verificationNotes: string,
contextBlock: string,
previousVerifyOutput: string,
issueNumber: number
issueNumber: number,
reviewFeedback: string
): string {
const metaDir = ACTION_DIR;
const parts = [
Expand All @@ -85,6 +86,15 @@ function buildClaudeCliPrompt(
issueBody,
'',
];
if (reviewFeedback.trim()) {
parts.push(
'',
'--- Review feedback to address (you must implement this) ---',
reviewFeedback.trim(),
'--- End review feedback ---',
''
);
}
if (verificationNotes) {
parts.push('Verification (run these to confirm):', verificationNotes, '');
}
Expand Down Expand Up @@ -171,10 +181,19 @@ async function main(): Promise<void> {
if (!owner || !repoName) throw new Error('Invalid GITHUB_REPOSITORY');

const issueNum = parseInt(issueNumber, 10);
const reviewFeedback = process.env.REVIEW_FEEDBACK || '';
const { title, body } = await fetchIssue(owner, repoName, issueNum, token);
const contextBlock = loadContextFiles();

const prompt = buildClaudeCliPrompt(title, body, verificationNotes, contextBlock, previousVerifyOutput, issueNum);
const prompt = buildClaudeCliPrompt(
title,
body,
verificationNotes,
contextBlock,
previousVerifyOutput,
issueNum,
reviewFeedback
);
runClaudeCli(prompt);
ensureMetaFiles(issueNum);
}
Expand Down
41 changes: 34 additions & 7 deletions .github/actions/issue-auto-implement/assess/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ export type AssessmentOutput = {
comment_body?: string;
verification_notes?: string;
pr_url?: string;
/** When trigger is PR review or comment on PR: exact text for implement step to address. Passed as REVIEW_FEEDBACK. */
review_feedback?: string;
};

async function checkExistingPr(owner: string, repo: string, issueNumber: number): Promise<{ pr_url: string } | null> {
Expand Down Expand Up @@ -163,14 +165,23 @@ function buildAssessmentPrompt(
return parts.join('\n');
}

const DEBUG = process.env.ASSESS_DEBUG === '1' || process.env.ASSESS_DEBUG === 'true';
// Logging: we always log Claude's raw response (the decision record; usually one short JSON blob).
// ASSESS_DEBUG=1 additionally logs the full prompt (can be large: issue, comments, context files).
const LOG_PROMPT = process.env.ASSESS_DEBUG === '1' || process.env.ASSESS_DEBUG === 'true';

function logAssessSummary(result: AssessmentOutput): void {
const rf = result.review_feedback?.length ?? 0;
const parts = [`action=${result.action}`, `issue_number=${result.issue_number ?? '?'}`];
if (rf > 0) parts.push(`review_feedback=${rf} chars`);
process.stderr.write(`Assess: ${parts.join(' ')}\n`);
}

async function callClaude(prompt: string, client?: Anthropic): Promise<AssessmentOutput> {
const api = client ?? new Anthropic({ apiKey: ANTHROPIC_API_KEY });
if (!client && !ANTHROPIC_API_KEY) {
throw new Error('AUTO_IMPLEMENT_ANTHROPIC_API_KEY is not set');
}
if (DEBUG) {
if (LOG_PROMPT) {
process.stderr.write('--- ASSESS PROMPT (sent to Claude) ---\n');
process.stderr.write(prompt);
process.stderr.write('\n--- END PROMPT ---\n');
Expand All @@ -181,11 +192,9 @@ async function callClaude(prompt: string, client?: Anthropic): Promise<Assessmen
messages: [{ role: 'user', content: prompt }],
});
const text = response.content?.[0]?.type === 'text' ? response.content[0].text : '';
if (DEBUG) {
process.stderr.write('--- CLAUDE RAW RESPONSE ---\n');
process.stderr.write(text);
process.stderr.write('\n--- END RESPONSE ---\n');
}
process.stderr.write('--- Claude response ---\n');
process.stderr.write(text);
process.stderr.write('\n--- end response ---\n');
const jsonMatch = text.match(/\{[\s\S]*\}/);
if (!jsonMatch) {
throw new Error('Claude did not return valid JSON: ' + text.slice(0, 200));
Expand Down Expand Up @@ -258,6 +267,24 @@ export async function assess(
const prompt = buildAssessmentPrompt(payload, eventName, referenceIssue, contextBlock, issueComments);
const result = await callClaude(prompt, opts.anthropicClient);
result.issue_number = normalized.issueNumber;

// Populate review_feedback so implement step can address reviewer/comment (it only gets issue #N from API otherwise).
if (eventName === 'pull_request_review') {
const review = (payload as Record<string, unknown>).review as { body?: string } | undefined;
result.review_feedback = review?.body?.trim() ?? '';
} else if (eventName === 'pull_request_review_comment') {
const comment = (payload as Record<string, unknown>).comment as { body?: string } | undefined;
result.review_feedback = comment?.body?.trim() ?? '';
} else if (eventName === 'issue_comment') {
const p = payload as Record<string, unknown>;
const issue = p.issue as { pull_request?: unknown } | undefined;
if (issue?.pull_request) {
const comment = p.comment as { body?: string } | undefined;
result.review_feedback = comment?.body?.trim() ?? '';
}
}

logAssessSummary(result);
return result;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -204,10 +204,15 @@ async function main(): Promise<void> {
stdio: 'inherit',
});

const reviewFeedback =
result.review_feedback ||
(EVENT_TYPE === 'issue_comment' ? COMMENT_BODY : '') ||
(EVENT_TYPE === 'pull_request_review' ? REVIEW_BODY : '');
const env = {
...process.env,
ISSUE_NUMBER: String(issueNumber),
VERIFICATION_NOTES: result.verification_notes || '',
REVIEW_FEEDBACK: reviewFeedback,
GITHUB_REPOSITORY: REPO,
GITHUB_TOKEN: TOKEN,
GITHUB_WORKSPACE: worktreePath,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,4 +60,61 @@ describe('assess', () => {
expect(['implement', 'request_info', 'redirect_to_pr']).toContain(result.action);
if (result.action === 'request_info') expect(typeof result.comment_body).toBe('string');
});

it('sets review_feedback from PR review body when event is pull_request_review', async () => {
const mockClient = {
messages: {
create: vi.fn().mockResolvedValue({
content: [
{
type: 'text',
text: '{"action":"implement","verification_notes":"Run go test ./..."}',
},
],
}),
},
} as unknown as import('@anthropic-ai/sdk').Anthropic;

const payload = loadFixture('pull_request_review.json');
const result = await assess('pull_request_review', payload, {
referenceIssue: '192',
anthropicClient: mockClient,
});

expect(result.action).toBe('implement');
expect(result.review_feedback).toBe('Please add a test for the new behavior.');
});

it('sets review_feedback from comment when event is issue_comment on PR', async () => {
const mockClient = {
messages: {
create: vi.fn().mockResolvedValue({
content: [
{ type: 'text', text: '{"action":"implement","verification_notes":"Run tests."}' },
],
}),
},
} as unknown as import('@anthropic-ai/sdk').Anthropic;

const payload = {
action: 'created',
issue: {
number: 261,
title: 'PR title',
body: 'Closes #215',
pull_request: {}, // comment on PR
},
comment: { body: 'Could we add some unit and acceptance test coverage for this change?' },
repository: { full_name: 'hookdeck/hookdeck-cli' },
};
const result = await assess('issue_comment', payload, {
referenceIssue: '192',
anthropicClient: mockClient,
});

expect(result.action).toBe('implement');
expect(result.review_feedback).toBe(
'Could we add some unit and acceptance test coverage for this change?'
);
});
});
Loading