job events, terminal command types, connect error codes#2855
Conversation
WalkthroughAdds telemetry and job-kind classification across backend and frontend: job controller now tracks JobKind, emits job lifecycle telemetry, and exposes running/connected job counters. SSH client/connection code introduces coded connection errors and a classifier. A CodedError utility is added. IPC/preload and renderer code track three new terminal-command categories (remote, WSL, durable) and surface counts to telemetry. OSC handling logic is extracted into a new osc-handlers frontend module implementing OSC 52, OSC 7, and OSC 16162 handlers. package.json version bumped. Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@pkg/remote/sshclient.go`:
- Line 606: The format verb in the error returned from utilds.Errorf when
creating a knownhost key is incorrect (%e); update the call in sshclient.go (the
return using utilds.Errorf with ConnErrCode_HostKeyVerify) to use a proper error
verb — replace "%e" with "%v" (or "%w" if you want to wrap the original error
for Unwrap() support) so the error message prints correctly and preserves the
original error for inspection.
🧹 Nitpick comments (3)
pkg/jobcontroller/jobcontroller.go (2)
553-567:GetNumJobsRunningloads all jobs to count running ones.This fetches every job row from the DB and deserializes them just to count those with
Runningstatus. Since it's called on the hourly telemetry interval, it's acceptable for now, but consider using aSELECT COUNT(*)with aWHEREclause on the JSON status field if the job table grows.
989-1003: Termination message is written before the "already done" guard.Lines 992-998 write
[shell terminated]to the terminal, then lines 1000-1003 check if the job is already done and bail out. If the job is already done, the message is unnecessarily written. SincejobTerminationMessageWritten.TestAndSetmakes this idempotent (only writes once), it's not a bug, but the guard could be moved above the message write for clarity.♻️ Move the guard before the terminal message write
func remoteTerminateJobManager(ctx context.Context, job *waveobj.Job) error { log.Printf("[job:%s] terminating job manager", job.OID) + if job.JobManagerStatus == JobManagerStatus_Done { + log.Printf("[job:%s] job manager already marked as done, skipping termination", job.OID) + return nil + } + shouldWrite := jobTerminationMessageWritten.TestAndSet(job.OID, true, func(val bool, exists bool) bool { return !exists || !val }) if shouldWrite { resetTerminalState(ctx, job.AttachedBlockId) writeMutedMessageToTerminal(job.AttachedBlockId, "[shell terminated]") } - if job.JobManagerStatus == JobManagerStatus_Done { - log.Printf("[job:%s] job manager already marked as done, skipping termination", job.OID) - return nil - } - bareRpc := wshclient.GetBareRpcClient()frontend/app/view/term/termwrap.ts (1)
320-351: Wrapcase "C"declarations in a block to fix lint errors.Biome flags
noSwitchDeclarationsfor the fourconstdeclarations (lines 323–326). Whilebreakprevents actual fallthrough, the declarations are technically hoisted to the switch scope, which is a correctness hazard if cases are reordered in the future and violates the linter rule.Note: case
"A"(line 308) has a pre-existing instance of the same issue withconst marker.Proposed fix
case "C": + { rtInfo["shell:state"] = "running-command"; globalStore.set(termWrap.shellIntegrationStatusAtom, "running-command"); const connName = globalStore.get(getBlockMetaKeyAtom(blockId, "connection")) ?? ""; const isRemote = isSshConnName(connName); const isWsl = isWslConnName(connName); const isDurable = globalStore.get(getBlockTermDurableAtom(blockId)) ?? false; getApi().incrementTermCommands({ isRemote, isWsl, isDurable }); if (cmd.data.cmd64) { // ... existing code ... } else { rtInfo["shell:lastcmd"] = null; globalStore.set(termWrap.lastCommandAtom, null); } // also clear lastcmdexitcode (since we've now started a new command) rtInfo["shell:lastcmdexitcode"] = null; break; + }
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@frontend/app/view/term/osc-handlers.ts`:
- Around line 273-288: The switch case "A" leaks the const marker declaration
into the switch scope; wrap the entire case "A" body in braces to create a block
scope so marker is block-local. Specifically, enclose the code that sets
rtInfo["shell:state"], calls
globalStore.set(termWrap.shellIntegrationStatusAtom, "ready"), declares const
marker = terminal.registerMarker(0), pushes onto termWrap.promptMarkers, and the
marker.onDispose handler inside { ... } so marker (and any other local
declarations) cannot be accessed by other cases.
🧹 Nitpick comments (4)
pkg/remote/sshclient.go (2)
117-144: String-based error classification is inherently fragile but acceptable as a fallback.The priority order (coded error → typed error → string match) is sound. The string-contains checks on lines 131–142 could silently misclassify if upstream libraries change their error messages, but since this is only used for telemetry and these are last-resort fallbacks, the risk is low.
Consider adding a brief comment noting that the string matching is best-effort for untyped errors, so future maintainers don't assume it's exhaustive.
261-261:UserInputCancelErrorlacksUnwrap()— inner error is not reachable viaerrors.As.The chain is
CodedError→UserInputCancelError{Err: err}, butUserInputCancelErrordoesn't implementUnwrap(), so the originalerr(e.g., a context deadline/cancellation) is unreachable viaerrors.Is/errors.As. This is fine for telemetry classification (the code is found at theCodedErrorlevel), but if any caller ever needs to inspect the root cause (e.g., distinguish timeout from user cancel), it won't work.Optional: add Unwrap to UserInputCancelError
+func (uice UserInputCancelError) Unwrap() error { + return uice.Err +}frontend/app/view/term/termwrap.ts (1)
31-36: Clean extraction of OSC handlers into a dedicated module.The delegation to the external handler functions is well-structured. Each handler receives the appropriate subset of arguments. The consistent
returnof the handler result properly propagates the ownership signal to xterm's parser.One minor nit:
ShellIntegrationStatusis imported here but line 89 still uses the inline union type"ready" | "running-command" | nullforshellIntegrationStatusAtom. Consider using the imported type for consistency:- shellIntegrationStatusAtom: jotai.PrimitiveAtom<"ready" | "running-command" | null>; + shellIntegrationStatusAtom: jotai.PrimitiveAtom<ShellIntegrationStatus | null>;Same applies to line 124.
Also applies to: 164-173
frontend/app/view/term/osc-handlers.ts (1)
271-271: Unchecked type assertion on incoming OSC command.
commandStris an arbitrary string from user input, cast toOsc16162Commandwithout validation. This works today because the switch silently ignores unknown commands, but a type guard or explicit validation would be safer against future regressions (e.g., if someone adds a default case that assumes the type is valid).
| switch (cmd.command) { | ||
| case "A": | ||
| rtInfo["shell:state"] = "ready"; | ||
| globalStore.set(termWrap.shellIntegrationStatusAtom, "ready"); | ||
| const marker = terminal.registerMarker(0); | ||
| if (marker) { | ||
| termWrap.promptMarkers.push(marker); | ||
| // addTestMarkerDecoration(terminal, marker, termWrap); | ||
| marker.onDispose(() => { | ||
| const idx = termWrap.promptMarkers.indexOf(marker); | ||
| if (idx !== -1) { | ||
| termWrap.promptMarkers.splice(idx, 1); | ||
| } | ||
| }); | ||
| } | ||
| break; |
There was a problem hiding this comment.
Wrap case "A" body in a block to fix switch-clause declaration leak.
The const marker declaration at line 277 is in the switch scope rather than a block scope. Other cases can erroneously access this declaration, and linters (Biome) flag this as a correctness issue. Wrapping the case body in braces restricts the variable's scope.
Proposed fix
case "A":
- rtInfo["shell:state"] = "ready";
- globalStore.set(termWrap.shellIntegrationStatusAtom, "ready");
- const marker = terminal.registerMarker(0);
- if (marker) {
- termWrap.promptMarkers.push(marker);
- // addTestMarkerDecoration(terminal, marker, termWrap);
- marker.onDispose(() => {
- const idx = termWrap.promptMarkers.indexOf(marker);
- if (idx !== -1) {
- termWrap.promptMarkers.splice(idx, 1);
- }
- });
- }
- break;
+ rtInfo["shell:state"] = "ready";
+ globalStore.set(termWrap.shellIntegrationStatusAtom, "ready");
+ {
+ const marker = terminal.registerMarker(0);
+ if (marker) {
+ termWrap.promptMarkers.push(marker);
+ // addTestMarkerDecoration(terminal, marker, termWrap);
+ marker.onDispose(() => {
+ const idx = termWrap.promptMarkers.indexOf(marker);
+ if (idx !== -1) {
+ termWrap.promptMarkers.splice(idx, 1);
+ }
+ });
+ }
+ }
+ break;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| switch (cmd.command) { | |
| case "A": | |
| rtInfo["shell:state"] = "ready"; | |
| globalStore.set(termWrap.shellIntegrationStatusAtom, "ready"); | |
| const marker = terminal.registerMarker(0); | |
| if (marker) { | |
| termWrap.promptMarkers.push(marker); | |
| // addTestMarkerDecoration(terminal, marker, termWrap); | |
| marker.onDispose(() => { | |
| const idx = termWrap.promptMarkers.indexOf(marker); | |
| if (idx !== -1) { | |
| termWrap.promptMarkers.splice(idx, 1); | |
| } | |
| }); | |
| } | |
| break; | |
| switch (cmd.command) { | |
| case "A": | |
| rtInfo["shell:state"] = "ready"; | |
| globalStore.set(termWrap.shellIntegrationStatusAtom, "ready"); | |
| { | |
| const marker = terminal.registerMarker(0); | |
| if (marker) { | |
| termWrap.promptMarkers.push(marker); | |
| // addTestMarkerDecoration(terminal, marker, termWrap); | |
| marker.onDispose(() => { | |
| const idx = termWrap.promptMarkers.indexOf(marker); | |
| if (idx !== -1) { | |
| termWrap.promptMarkers.splice(idx, 1); | |
| } | |
| }); | |
| } | |
| } | |
| break; |
🧰 Tools
🪛 Biome (2.3.14)
[error] 277-277: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.
The declaration is defined in this switch clause:
Safe fix: Wrap the declaration in a block.
(lint/correctness/noSwitchDeclarations)
🤖 Prompt for AI Agents
In `@frontend/app/view/term/osc-handlers.ts` around lines 273 - 288, The switch
case "A" leaks the const marker declaration into the switch scope; wrap the
entire case "A" body in braces to create a block scope so marker is block-local.
Specifically, enclose the code that sets rtInfo["shell:state"], calls
globalStore.set(termWrap.shellIntegrationStatusAtom, "ready"), declares const
marker = terminal.registerMarker(0), pushes onto termWrap.promptMarkers, and the
marker.onDispose handler inside { ... } so marker (and any other local
declarations) cannot be accessed by other cases.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@pkg/remote/sshclient.go`:
- Around line 45-62: ConnErrCode_UserTimeout is declared but never used; update
the timeout classification in ClassifyConnError to distinguish dial timeouts vs
user-input timeouts by returning ConnErrCode_UserTimeout for timeout errors that
originate from user interaction (e.g., context.DeadlineExceeded or timeouts from
prompt/read paths) and keep ConnErrCode_Dial for network dial timeouts, or if
you decide timeouts should not be distinguished remove the unused
ConnErrCode_UserTimeout constant; locate ClassifyConnError and the timeout
checks there and either change the returned constant to ConnErrCode_UserTimeout
for user-driven timeouts or delete the ConnErrCode_UserTimeout declaration.
🧹 Nitpick comments (3)
frontend/app/view/term/termwrap.ts (2)
164-173: Consider defining an interface for theTermWrapdependency passed to OSC handlers.Passing the full
TermWrapinstance tohandleOsc52CommandandhandleOsc16162Commandtightly couples the new module back to this class. A narrow interface (e.g., only the atoms and methods the handlers actually use) would make the dependency explicit and the handlers easier to test independently. Fine for now as an incremental extraction.
293-295: Unsafe type assertion on backend data.
rtInfo["shell:state"] as ShellIntegrationStatustrusts the backend to return only"ready"or"running-command". If the backend ever returns an unexpected value, it will silently propagate. A runtime guard (e.g., checking against allowed values) would be safer, though the risk is low here.pkg/remote/sshclient.go (1)
748-764: Consider classifyingssh.NewClientConnerrors too.Lines 751 and 758 correctly wrap dial errors with
ConnErrCode_Dial. However, line 764 returns the error fromssh.NewClientConnwithout a coded error. This is the SSH handshake/auth negotiation phase — errors here will fall through toClassifyConnError's string matching, which may or may not classify them accurately. If you're relying on the fallback classifier intentionally, this is fine; otherwise, wrapping with a more specific code here could improve classification accuracy.
| const ( | ||
| ConnErrCode_ConfigParse = "config-parse" | ||
| ConnErrCode_ConfigDefault = "config-default" | ||
| ConnErrCode_ProxyDepth = "proxy-depth" | ||
| ConnErrCode_ProxyParse = "proxy-parse" | ||
| ConnErrCode_SecretStore = "secret-error" | ||
| ConnErrCode_SecretNotFound = "secret-notfound" | ||
| ConnErrCode_KnownHostsNone = "knownhosts-none" | ||
| ConnErrCode_KnownHostsFmt = "knownhosts-format" | ||
| ConnErrCode_Dial = "dial-error" | ||
| ConnErrCode_HostKeyRevoked = "hostkey-revoked" | ||
| ConnErrCode_HostKeyChanged = "hostkey-changed" | ||
| ConnErrCode_HostKeyVerify = "hostkey-verify" | ||
| ConnErrCode_UserCancelled = "user-cancelled" | ||
| ConnErrCode_UserTimeout = "user-timeout" | ||
| ConnErrCode_AuthFailed = "auth-failed" | ||
| ConnErrCode_Unknown = "unknown" | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check all usages of ConnErrCode_UserTimeout in the codebase
rg -n 'ConnErrCode_UserTimeout' --type=goRepository: wavetermdev/waveterm
Length of output: 135
🏁 Script executed:
# Also check the sshclient.go file for timeout-related error handling
rg -n -B2 -A2 'timeout' pkg/remote/sshclient.go --type=go -iRepository: wavetermdev/waveterm
Length of output: 1724
🏁 Script executed:
# Let's look at the ClassifyConnError function specifically around line 144
sed -n '121,150p' pkg/remote/sshclient.goRepository: wavetermdev/waveterm
Length of output: 830
ConnErrCode_UserTimeout is defined but never used anywhere in the codebase.
The constant is declared at line 59 but has no assignment sites in this file or elsewhere. Meanwhile, ClassifyConnError (lines 144–145) classifies all timeout-related errors as ConnErrCode_Dial rather than ConnErrCode_UserTimeout. Either this constant should be removed if it's unused, or the timeout classification logic should be updated to distinguish between dial timeouts and user-input timeouts by using ConnErrCode_UserTimeout where appropriate.
🤖 Prompt for AI Agents
In `@pkg/remote/sshclient.go` around lines 45 - 62, ConnErrCode_UserTimeout is
declared but never used; update the timeout classification in ClassifyConnError
to distinguish dial timeouts vs user-input timeouts by returning
ConnErrCode_UserTimeout for timeout errors that originate from user interaction
(e.g., context.DeadlineExceeded or timeouts from prompt/read paths) and keep
ConnErrCode_Dial for network dial timeouts, or if you decide timeouts should not
be distinguished remove the unused ConnErrCode_UserTimeout constant; locate
ClassifyConnError and the timeout checks there and either change the returned
constant to ConnErrCode_UserTimeout for user-driven timeouts or delete the
ConnErrCode_UserTimeout declaration.
No description provided.