eval: add optional setup script support for eval sessions by dgageot · Pull Request #1673 · docker/cagent

dgageot · 2026-02-10T07:26:10Z

Add a 'setup' property to EvalCriteria that allows defining a sh script to run inside the eval container before cagent exec. The script is written to a temp file, bind-mounted into the container, and executed as part of the entrypoint chain. This is useful for preparing the container environment (e.g., installing packages, initializing repos) before the agent runs.

Assisted-By: cagent

Add a 'setup' property to EvalCriteria that allows defining a sh script to run inside the eval container before cagent exec. The script is written to a temp file, bind-mounted into the container, and executed as part of the entrypoint chain. This is useful for preparing the container environment (e.g., installing packages, initializing repos) before the agent runs. Assisted-By: cagent

docker-agent

Review Summary

Reviewed the setup script support implementation. Found 2 medium-severity issues related to error handling and cleanup. Both are straightforward to fix and improve the robustness of the temporary file handling.

The core functionality is well-designed - the setup script is properly isolated in a temp file, mounted read-only into the container, and executed before the agent. The defer-based cleanup is correctly placed to execute after the container completes.

docker-agent · 2026-02-10T07:31:48Z

pkg/evaluation/eval.go

+	if setup != "" {
+		// Run setup script, then cagent exec with the original arguments.
+		args = append(args, "sh", "-c", "sh /setup.sh && exec /cagent exec --yolo --json \"$@\"", "--", "/configs/"+agentFile)
+	} else {


Error handling issue for setup script failures

When a container exits with an error, the code cannot distinguish between setup script failure and cagent execution failure. Both result in the same error path (lines 446-453), making debugging difficult. The error message "container failed" doesn't indicate whether the failure occurred in the setup script or the cagent exec command.

Consider:

Adding a marker/log line at the end of the setup script to indicate successful completion

Checking for this marker to provide clearer error messages

Or wrapping the setup script execution to capture its specific exit code

Example:

sh /setup.sh && echo '__SETUP_COMPLETE__' && exec /cagent exec --yolo --json "$@"

Then parse the output to determine if setup completed before reporting errors.

docker-agent · 2026-02-10T07:31:48Z

pkg/evaluation/eval.go

+
+		args = append(args,
+			"-v", setupFile+":/setup.sh:ro",
+			"--entrypoint", "/run.sh",


Missing error handling for file removal

The defer os.Remove(setupFile) silently ignores any errors during temporary file cleanup. While os.Remove() failures are uncommon on Unix-like systems, this is poor error handling practice. If the file cannot be removed due to permissions or file system issues, temporary files could accumulate on disk over repeated evaluations.

Consider logging cleanup errors:

defer func() { if err := os.Remove(setupFile); err != nil && !os.IsNotExist(err) { slog.Debug("failed to remove setup file", "path", setupFile, "error", err) } }()

This makes cleanup failures visible for debugging without failing the evaluation.

dgageot requested a review from a team as a code owner February 10, 2026 07:26

docker-agent bot reviewed Feb 10, 2026

View reviewed changes

trungutt approved these changes Feb 10, 2026

View reviewed changes

dgageot merged commit d7f1f6e into docker:main Feb 10, 2026
8 checks passed

BrewTestBot mentioned this pull request Feb 12, 2026

cagent 1.23.0 Homebrew/homebrew-core#267292

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eval: add optional setup script support for eval sessions#1673

eval: add optional setup script support for eval sessions#1673
dgageot merged 1 commit intodocker:mainfrom
dgageot:evals-setup

dgageot commented Feb 10, 2026

Uh oh!

docker-agent bot left a comment

Uh oh!

docker-agent bot Feb 10, 2026

Uh oh!

docker-agent bot Feb 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dgageot commented Feb 10, 2026

Uh oh!

docker-agent bot left a comment

Choose a reason for hiding this comment

Review Summary

Uh oh!

docker-agent bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

docker-agent bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants