Skip to content

Gemini session resume#810

Open
Harsh9485 wants to merge 10 commits intocoder:mainfrom
Harsh9485:gemini-session-resume
Open

Gemini session resume#810
Harsh9485 wants to merge 10 commits intocoder:mainfrom
Harsh9485:gemini-session-resume

Conversation

@Harsh9485
Copy link
Copy Markdown
Contributor

Description

Closes: #745

This PR adds the enable_state_persistence variable.

The session resume logic works as follows: when we run the Gemini CLI, it creates a folder with the same name as the directory where the CLI command runs. In the script, we use agentapi, which runs in the $GEMINI_START_DIRECTORY directory. So, I get the directory name and check whether it exists. If it does not exist, a new session is created.

Changes:

  • Added enable_state_persistence variable.
  • Improved AGS passing in start.sh.
  • Added --type gemini in agentapi.
  • Added logic for session resume using --resume.
  • Updated Readme.md file
  • added main.tftest.hcl file.

Type of Change

  • New module
  • New template
  • Bug fix
  • Feature/enhancement
  • Documentation
  • Other

Module Information

Path: registry/coder-labs/modules/gemini
New version: v3.1.0
Breaking change: [ ] Yes [x] No

Testing & Validation

  • Tests pass (bun test)
  • Code formatted (bun fmt)
  • Changes tested locally

Related Issues

#749
#745

@Harsh9485
Copy link
Copy Markdown
Contributor Author

@mafredri,
I can’t test this because the module requires Node.js, and the user must install it first to use the Gemini module.

If I add the Gemini module but my workspace does not have Node installed, the workspace crashes. If I first don’t add the module, install Node, and then add the module in the template, Coder creates a new workspace, and again Node is not available, so the workspace crashes.

In this PR: #374, the Node installation part was removed, and I don’t see any valid reason for that.

@matifali matifali requested a review from mafredri March 24, 2026 11:34
@DevelopmentCats
Copy link
Copy Markdown
Contributor

@mafredri, I can’t test this because the module requires Node.js, and the user must install it first to use the Gemini module.

If I add the Gemini module but my workspace does not have Node installed, the workspace crashes. If I first don’t add the module, install Node, and then add the module in the template, Coder creates a new workspace, and again Node is not available, so the workspace crashes.

In this PR: #374, the Node installation part was removed, and I don’t see any valid reason for that.

You should just be able to use an image with node installed, or use the pre_install_script variable to install node.js before gemini even goes to install?

Copy link
Copy Markdown
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base64 encoding of env vars in the start script is a solid improvement, eliminating a class of shell injection via special characters in API keys and prompts. Adding --type gemini to the agentapi server command aligns with the codex, copilot, and cursor-cli modules.

A few things need attention before this can merge: 4 P1, 4 P2, 1 P3, 1 nit across 10 inline comments.

Hisoka: "The --resume branch silently eating task_prompt on restart, the undeclared breaking change to the default folder, and the new exit 1 killing Vertex AI ADC users. Each breaks an existing use case documented in the README."

Mafuuu: "The promise is broken at the default configuration. enable_state_persistence defaults to true with agentapi v0.10.0, a combination the README itself says won't work."

Also: all four README code examples still show version = "3.0.0" but the PR declares v3.1.0. The registry checklist requires updating version references.

🤖 This review was generated with the help of Coder Agents.

if [ -d "$GEMINI_START_DIRECTORY/.gemini/tmp/$SESSION_FOLDER_NAME/chats/" ]; then
printf "Existing Gemini chats detected. Starting Gemini CLI in interactive mode with existing chats.\n"
GEMINI_ARGS+=(--resume)
agentapi server --type gemini --term-width 67 --term-height 1190 -- \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Task prompt is silently dropped when resuming an existing session. (Bisky P1, Hisoka P1, Mafuuu P1, Mafu-san P1, Meruem P1, Kurapika P1, Luffy P1)

When the chat directory exists, the script takes the resume branch and skips the GEMINI_TASK_PROMPT logic entirely (lines 83-88 are in the else branch). A user who configures task_prompt = "Fix the tests" and restarts a workspace gets their prompt silently ignored. The old code applied task_prompt unconditionally before resume logic existed.

This means the "Automated task execution" example from the README stops working after the first workspace restart. At minimum, log a warning when task_prompt is set but resume takes precedence. The codex module handles this correctly with a separate continue variable that gates the resume decision independently from the task prompt.

🤖

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the codex module and saw that it always passes the prompt without checking whether it is new or not. In a scenario where the prompt is the same, the module still sends it again, even though the agent has already processed it.

I have a suggestion: we can check whether the task_prompt has changed. If it has changed, then we send it with --resume.

In the issue, you mentioned it : On warm start (prior sessions exist): Gemini resumes the latest session. The task prompt is not re-sent.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to diff the prompt IMO. The claude-code module just doesn't re-send the prompt when resuming, and lets the user set continue = false if they want a fresh session. Simpler and more predictable than change detection.

else
printf "No API key provided (neither GEMINI_API_KEY nor GOOGLE_API_KEY)\n"
exit 1
fi
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 New exit 1 on missing API key breaks Vertex AI users with Application Default Credentials. (Hisoka P1, Mafuuu P1, Bisky P1)

The old code logged "No API key provided" and continued to agentapi server. The new code exits. Vertex AI with ADC authenticates through service accounts or metadata server without setting GEMINI_API_KEY or GOOGLE_API_KEY. The README documents Vertex AI as a supported use case with use_vertexai = true, but no explicit API key is shown as required in that example.

var.gemini_api_key defaults to "", which base64-decodes to empty, so [ -n "$GEMINI_API_KEY" ] is false. After this change, Vertex AI ADC users hit exit 1 instead of starting. The old behavior (warn and continue) was correct for this case. If the exit is intentional, ADC needs a third condition (e.g. checking if ADC is available or if use_vertexai is true).

🤖

description = "The folder to run Gemini in."
default = "/home/coder"
default = "/home/coder/project"
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Default folder changed from /home/coder to /home/coder/project. This is an undeclared breaking change. (Luffy P1, Hisoka P1, Meruem P1, Bisky P2, Mafuuu P2, Mafu-san P2, Kurapika P2)

Every existing user relying on the default gets a different working directory on upgrade. Worse, the session resume feature this PR adds would fail for exactly the users this change silently migrates: existing session state at /home/coder/.gemini/ won't be found at /home/coder/project/.gemini/.

The PR description marks "Breaking change: No" and uses a minor version bump (3.1.0). Per registry conventions, breaking changes require a major bump. If this change is intentional, it needs a major version bump and a migration note. If accidental, revert it.

🤖

variable "enable_state_persistence" {
type = bool
description = "Enable AgentAPI conversation state persistence across restarts."
default = true
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 enable_state_persistence defaults to true, but agentapi_version defaults to v0.10.0, which is below the >= v0.12.0 requirement stated in the README. (Mafuuu P1, Mafu-san P1, Meruem P1, Hisoka P2, Kurapika P2, Luffy P2)

A user who accepts all defaults gets enable_state_persistence = true with an agentapi binary that can't support it. The agentapi module gracefully skips persistence with a warning for old versions, but the default configuration enables a feature that cannot work. The codex module gets this right by defaulting agentapi_version to v0.12.1 when enabling state persistence.

Either bump the default agentapi_version to at least v0.12.0, or default enable_state_persistence to false.

🤖

if [ -n "$GEMINI_YOLO_MODE" ] && [ "$GEMINI_YOLO_MODE" = "true" ]; then
printf "YOLO mode enabled - will auto-approve all tool calls\n"
GEMINI_ARGS=(--yolo)
fi
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 GEMINI_ARGS is never initialized when YOLO mode is off (the common case). (Meruem P1, Mafu-san P1, Bisky P1, Hisoka P2, Mafuuu P2, Luffy P2)

When GEMINI_YOLO_MODE != "true", GEMINI_ARGS is never declared. Lines 78, 88, and 91 use += on an unset variable. This works because nounset is disabled at line 17, but it's correct by coincidence, not construction. The old code always initialized GEMINI_ARGS explicitly. If anyone re-enables nounset later in the script, this breaks.

Add GEMINI_ARGS=() before this conditional. The codex module initializes CODEX_ARGS=() unconditionally before any appends.

PS. Line 91 (GEMINI_ARGS+=()) is a no-op that does nothing. Remove it or replace it with the initialization that's actually needed up here.

🤖

bash -c "$(printf '%q ' gemini "${GEMINI_ARGS[@]}")"
SESSION_FOLDER_NAME=$(basename "${GEMINI_START_DIRECTORY}")
if [ -d "$GEMINI_START_DIRECTORY/.gemini/tmp/$SESSION_FOLDER_NAME/chats/" ]; then
printf "Existing Gemini chats detected. Starting Gemini CLI in interactive mode with existing chats.\n"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 enable_state_persistence and --resume are presented as one feature but are completely independent. (Kurapika P1, Meruem P2, Mafu-san P2)

enable_state_persistence passes through to the agentapi module (main.tf:200) and controls AgentAPI conversation state. The --resume flag here fires whenever the chat directory exists, with no variable to disable it. A user who sets enable_state_persistence = false to opt out of all persistence still gets --resume forced on. There is no way to start a fresh Gemini session after chats have been created.

Either add a separate variable to control Gemini CLI resume behavior, or document clearly that enable_state_persistence only controls AgentAPI state and Gemini CLI resume is always active when prior chats exist.

Obs The session detection path (.gemini/tmp/$SESSION_FOLDER_NAME/chats/) is an undocumented Gemini CLI implementation detail. If a future CLI version changes its storage layout, resume silently stops working with no error, the script just falls through to starting a new session. Worth a comment documenting which CLI version this was verified against.

🤖

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I use the enable_state_persistence variable, or create a new variable to control Gemini CLI resume behavior?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate variable. The claude-code module has continue (bool, default true) for CLI resume, independent of enable_state_persistence. They control different things and users should be able to toggle them independently.

## State Persistence

AgentAPI can save and restore its conversation state to disk across workspace restarts. This complements `continue` (which resumes the Gemini CLI session) by also preserving the AgentAPI-level context. Enabled by default, requires agentapi >= v0.12.0 (older versions skip it with a warning).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 continue in backticks implies a command or flag, but it doesn't exist in this module. The script uses --resume. (Hisoka P2, Leorio Nit, Pen Botter Nit)

grep -n "continue" registry/coder-labs/modules/gemini/main.tf returns no results. This text appears to be copied from the codex module's README, which does have a continue variable. A reader searching for continue in this codebase will find nothing.

If this refers to the --resume flag, call it --resume. If it's a Gemini CLI concept, link or explain it.

Nit The ## State Persistence heading breaks the heading hierarchy. ### Using Vertex AI (Enterprise) was previously a child of ## Examples. Now it appears to be a child of ## State Persistence, which has nothing to do with Vertex AI. Move this section after the Vertex AI subsection, or place it before ## Troubleshooting. (Leorio Nit, Pen Botter Nit)

🤖

@@ -0,0 +1,82 @@
run "test_gemini_basic" {
command = plan
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Every test assertion checks that a variable equals what it was just set to, or that a default equals its declared default. This proves Terraform can assign variables, not that the module works. (Bisky P1, Meruem P3, Hisoka Obs)

test_enable_state_persistence_default asserts var.enable_state_persistence == true, which is tautological. None of the tests verify that enable_state_persistence actually flows through to the agentapi module, that the start_script contains expected content, or that the base64 encoding works. The only slightly real assertion is test_gemini_with_api_key checking coder_env.gemini_api_key[0].value.

Plan-level assertions on module outputs or resource attributes would catch wiring errors. Having a test file is better than not, but these particular tests create a false sense of coverage.

PS. Missing newline at end of file (line 82).

🤖

GEMINI_ARGS+=(--prompt-interactive "$PROMPT")
else
printf "Starting Gemini CLI in interactive mode.\n"
GEMINI_ARGS+=()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit Duplicate log message. Line 82 already prints "Starting Gemini CLI in interactive mode." This line prints it again. (Gon, Leorio)

Remove this line, or change line 82 to just say "No existing Gemini chats found." and leave the mode announcement to the inner branches.

🤖

mafredri

This comment was marked as duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gemini: wire session resume on workspace restart

3 participants