eval: add optional setup script support for eval sessions#1673
eval: add optional setup script support for eval sessions#1673dgageot merged 1 commit intodocker:mainfrom
Conversation
Add a 'setup' property to EvalCriteria that allows defining a sh script to run inside the eval container before cagent exec. The script is written to a temp file, bind-mounted into the container, and executed as part of the entrypoint chain. This is useful for preparing the container environment (e.g., installing packages, initializing repos) before the agent runs. Assisted-By: cagent
There was a problem hiding this comment.
Review Summary
Reviewed the setup script support implementation. Found 2 medium-severity issues related to error handling and cleanup. Both are straightforward to fix and improve the robustness of the temporary file handling.
The core functionality is well-designed - the setup script is properly isolated in a temp file, mounted read-only into the container, and executed before the agent. The defer-based cleanup is correctly placed to execute after the container completes.
| if setup != "" { | ||
| // Run setup script, then cagent exec with the original arguments. | ||
| args = append(args, "sh", "-c", "sh /setup.sh && exec /cagent exec --yolo --json \"$@\"", "--", "/configs/"+agentFile) | ||
| } else { |
There was a problem hiding this comment.
Error handling issue for setup script failures
When a container exits with an error, the code cannot distinguish between setup script failure and cagent execution failure. Both result in the same error path (lines 446-453), making debugging difficult. The error message "container failed" doesn't indicate whether the failure occurred in the setup script or the cagent exec command.
Consider:
- Adding a marker/log line at the end of the setup script to indicate successful completion
- Checking for this marker to provide clearer error messages
- Or wrapping the setup script execution to capture its specific exit code
Example:
sh /setup.sh && echo '__SETUP_COMPLETE__' && exec /cagent exec --yolo --json "$@"Then parse the output to determine if setup completed before reporting errors.
|
|
||
| args = append(args, | ||
| "-v", setupFile+":/setup.sh:ro", | ||
| "--entrypoint", "/run.sh", |
There was a problem hiding this comment.
Missing error handling for file removal
The defer os.Remove(setupFile) silently ignores any errors during temporary file cleanup. While os.Remove() failures are uncommon on Unix-like systems, this is poor error handling practice. If the file cannot be removed due to permissions or file system issues, temporary files could accumulate on disk over repeated evaluations.
Consider logging cleanup errors:
defer func() {
if err := os.Remove(setupFile); err != nil && !os.IsNotExist(err) {
slog.Debug("failed to remove setup file", "path", setupFile, "error", err)
}
}()This makes cleanup failures visible for debugging without failing the evaluation.
Add a 'setup' property to EvalCriteria that allows defining a sh script to run inside the eval container before cagent exec. The script is written to a temp file, bind-mounted into the container, and executed as part of the entrypoint chain. This is useful for preparing the container environment (e.g., installing packages, initializing repos) before the agent runs.
Assisted-By: cagent