Skip to content

feat: complete token usage tracking across agent target, LLM judges, and code judge proxy#390

Merged
christso merged 3 commits intomainfrom
feat/387-token-usage-tracking
Feb 26, 2026
Merged

feat: complete token usage tracking across agent target, LLM judges, and code judge proxy#390
christso merged 3 commits intomainfrom
feat/387-token-usage-tracking

Conversation

@christso
Copy link
Collaborator

@christso christso commented Feb 26, 2026

Summary

  • Map AI SDK usage (inputTokens/outputTokens) to ProviderTokenUsage in mapResponse — fixes Azure, Anthropic, and Gemini providers
  • Add optional tokenUsage field to EvaluatorResult and EvaluationScore types
  • Capture token usage from LLM judge generateText() and provider.invoke() calls
  • Accumulate token usage across target proxy invocations with per-call reporting
  • Surface proxy tokenUsage in code evaluator results
  • Extend TargetInvokeResponse with per-call tokenUsage for code judge scripts
  • Pass tokenUsage through orchestrator EvaluationScoreEvaluatorResult mapping

Test plan

  • Unit tests pass (985 tests, 0 failures)
  • TypeScript typecheck passes
  • e2e: bun agentv eval examples/features/basic/evals/dataset.eval.yaml --test-id feature-proposal-brainstorm — verify trace.token_usage in JSONL
  • e2e: Verify scores[].token_usage on LLM judge entries in JSONL
  • e2e: Run a code judge example with target proxy and verify token_usage on scores entry

Closes #387

🤖 Generated with Claude Code

…and code judge proxy

- Map AI SDK usage (inputTokens/outputTokens) to ProviderTokenUsage in mapResponse
- Add optional tokenUsage field to EvaluatorResult and EvaluationScore types
- Capture token usage from LLM judge generateText() and provider.invoke() calls
- Accumulate token usage across target proxy invocations
- Surface proxy tokenUsage in code evaluator results
- Extend TargetInvokeResponse with per-call tokenUsage for code judge scripts
- Pass tokenUsage through orchestrator EvaluationScore → EvaluatorResult mapping

Closes #387

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Feb 26, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: eafcedc
Status: ✅  Deploy successful!
Preview URL: https://27741650.agentv.pages.dev
Branch Preview URL: https://feat-387-token-usage-trackin.agentv.pages.dev

View logs

christso and others added 2 commits February 26, 2026 11:41
…sage, add unit tests

- Replace 6 inline `{ input: number; output: number }` types with shared TokenUsage import
- Add tokenUsage to ChildEvaluatorResult for accurate total cost tracking
- Add 7 unit tests covering type contracts and proxy accumulation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@christso christso merged commit 833a4e6 into main Feb 26, 2026
1 check passed
@christso christso deleted the feat/387-token-usage-tracking branch February 26, 2026 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: complete token usage tracking across agent target, LLM judges, and code judge proxy

1 participant