Skip to content

job events, terminal command types, connect error codes#2855

Merged
sawka merged 9 commits intomainfrom
sawka/job-events
Feb 11, 2026
Merged

job events, terminal command types, connect error codes#2855
sawka merged 9 commits intomainfrom
sawka/job-events

Conversation

@sawka
Copy link
Member

@sawka sawka commented Feb 11, 2026

No description provided.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 11, 2026

Walkthrough

Adds telemetry and job-kind classification across backend and frontend: job controller now tracks JobKind, emits job lifecycle telemetry, and exposes running/connected job counters. SSH client/connection code introduces coded connection errors and a classifier. A CodedError utility is added. IPC/preload and renderer code track three new terminal-command categories (remote, WSL, durable) and surface counts to telemetry. OSC handling logic is extracted into a new osc-handlers frontend module implementing OSC 52, OSC 7, and OSC 16162 handlers. package.json version bumped.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 24.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive No description was provided by the author, making it impossible to assess relevance to the changeset. Add a pull request description that explains the purpose and scope of these changes, even if brief.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the three main categories of changes: job events (job lifecycle telemetry), terminal command types (remote/wsl/durable classification), and connect error codes (SSH connection error categorization).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sawka/job-events

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@pkg/remote/sshclient.go`:
- Line 606: The format verb in the error returned from utilds.Errorf when
creating a knownhost key is incorrect (%e); update the call in sshclient.go (the
return using utilds.Errorf with ConnErrCode_HostKeyVerify) to use a proper error
verb — replace "%e" with "%v" (or "%w" if you want to wrap the original error
for Unwrap() support) so the error message prints correctly and preserves the
original error for inspection.
🧹 Nitpick comments (3)
pkg/jobcontroller/jobcontroller.go (2)

553-567: GetNumJobsRunning loads all jobs to count running ones.

This fetches every job row from the DB and deserializes them just to count those with Running status. Since it's called on the hourly telemetry interval, it's acceptable for now, but consider using a SELECT COUNT(*) with a WHERE clause on the JSON status field if the job table grows.


989-1003: Termination message is written before the "already done" guard.

Lines 992-998 write [shell terminated] to the terminal, then lines 1000-1003 check if the job is already done and bail out. If the job is already done, the message is unnecessarily written. Since jobTerminationMessageWritten.TestAndSet makes this idempotent (only writes once), it's not a bug, but the guard could be moved above the message write for clarity.

♻️ Move the guard before the terminal message write
 func remoteTerminateJobManager(ctx context.Context, job *waveobj.Job) error {
 	log.Printf("[job:%s] terminating job manager", job.OID)
 
+	if job.JobManagerStatus == JobManagerStatus_Done {
+		log.Printf("[job:%s] job manager already marked as done, skipping termination", job.OID)
+		return nil
+	}
+
 	shouldWrite := jobTerminationMessageWritten.TestAndSet(job.OID, true, func(val bool, exists bool) bool {
 		return !exists || !val
 	})
 	if shouldWrite {
 		resetTerminalState(ctx, job.AttachedBlockId)
 		writeMutedMessageToTerminal(job.AttachedBlockId, "[shell terminated]")
 	}
 
-	if job.JobManagerStatus == JobManagerStatus_Done {
-		log.Printf("[job:%s] job manager already marked as done, skipping termination", job.OID)
-		return nil
-	}
-
 	bareRpc := wshclient.GetBareRpcClient()
frontend/app/view/term/termwrap.ts (1)

320-351: Wrap case "C" declarations in a block to fix lint errors.

Biome flags noSwitchDeclarations for the four const declarations (lines 323–326). While break prevents actual fallthrough, the declarations are technically hoisted to the switch scope, which is a correctness hazard if cases are reordered in the future and violates the linter rule.

Note: case "A" (line 308) has a pre-existing instance of the same issue with const marker.

Proposed fix
         case "C":
+        {
             rtInfo["shell:state"] = "running-command";
             globalStore.set(termWrap.shellIntegrationStatusAtom, "running-command");
             const connName = globalStore.get(getBlockMetaKeyAtom(blockId, "connection")) ?? "";
             const isRemote = isSshConnName(connName);
             const isWsl = isWslConnName(connName);
             const isDurable = globalStore.get(getBlockTermDurableAtom(blockId)) ?? false;
             getApi().incrementTermCommands({ isRemote, isWsl, isDurable });
             if (cmd.data.cmd64) {
                 // ... existing code ...
             } else {
                 rtInfo["shell:lastcmd"] = null;
                 globalStore.set(termWrap.lastCommandAtom, null);
             }
             // also clear lastcmdexitcode (since we've now started a new command)
             rtInfo["shell:lastcmdexitcode"] = null;
             break;
+        }

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@frontend/app/view/term/osc-handlers.ts`:
- Around line 273-288: The switch case "A" leaks the const marker declaration
into the switch scope; wrap the entire case "A" body in braces to create a block
scope so marker is block-local. Specifically, enclose the code that sets
rtInfo["shell:state"], calls
globalStore.set(termWrap.shellIntegrationStatusAtom, "ready"), declares const
marker = terminal.registerMarker(0), pushes onto termWrap.promptMarkers, and the
marker.onDispose handler inside { ... } so marker (and any other local
declarations) cannot be accessed by other cases.
🧹 Nitpick comments (4)
pkg/remote/sshclient.go (2)

117-144: String-based error classification is inherently fragile but acceptable as a fallback.

The priority order (coded error → typed error → string match) is sound. The string-contains checks on lines 131–142 could silently misclassify if upstream libraries change their error messages, but since this is only used for telemetry and these are last-resort fallbacks, the risk is low.

Consider adding a brief comment noting that the string matching is best-effort for untyped errors, so future maintainers don't assume it's exhaustive.


261-261: UserInputCancelError lacks Unwrap() — inner error is not reachable via errors.As.

The chain is CodedErrorUserInputCancelError{Err: err}, but UserInputCancelError doesn't implement Unwrap(), so the original err (e.g., a context deadline/cancellation) is unreachable via errors.Is/errors.As. This is fine for telemetry classification (the code is found at the CodedError level), but if any caller ever needs to inspect the root cause (e.g., distinguish timeout from user cancel), it won't work.

Optional: add Unwrap to UserInputCancelError
+func (uice UserInputCancelError) Unwrap() error {
+	return uice.Err
+}
frontend/app/view/term/termwrap.ts (1)

31-36: Clean extraction of OSC handlers into a dedicated module.

The delegation to the external handler functions is well-structured. Each handler receives the appropriate subset of arguments. The consistent return of the handler result properly propagates the ownership signal to xterm's parser.

One minor nit: ShellIntegrationStatus is imported here but line 89 still uses the inline union type "ready" | "running-command" | null for shellIntegrationStatusAtom. Consider using the imported type for consistency:

-    shellIntegrationStatusAtom: jotai.PrimitiveAtom<"ready" | "running-command" | null>;
+    shellIntegrationStatusAtom: jotai.PrimitiveAtom<ShellIntegrationStatus | null>;

Same applies to line 124.

Also applies to: 164-173

frontend/app/view/term/osc-handlers.ts (1)

271-271: Unchecked type assertion on incoming OSC command.

commandStr is an arbitrary string from user input, cast to Osc16162Command without validation. This works today because the switch silently ignores unknown commands, but a type guard or explicit validation would be safer against future regressions (e.g., if someone adds a default case that assumes the type is valid).

Comment on lines +273 to +288
switch (cmd.command) {
case "A":
rtInfo["shell:state"] = "ready";
globalStore.set(termWrap.shellIntegrationStatusAtom, "ready");
const marker = terminal.registerMarker(0);
if (marker) {
termWrap.promptMarkers.push(marker);
// addTestMarkerDecoration(terminal, marker, termWrap);
marker.onDispose(() => {
const idx = termWrap.promptMarkers.indexOf(marker);
if (idx !== -1) {
termWrap.promptMarkers.splice(idx, 1);
}
});
}
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Wrap case "A" body in a block to fix switch-clause declaration leak.

The const marker declaration at line 277 is in the switch scope rather than a block scope. Other cases can erroneously access this declaration, and linters (Biome) flag this as a correctness issue. Wrapping the case body in braces restricts the variable's scope.

Proposed fix
         case "A":
-            rtInfo["shell:state"] = "ready";
-            globalStore.set(termWrap.shellIntegrationStatusAtom, "ready");
-            const marker = terminal.registerMarker(0);
-            if (marker) {
-                termWrap.promptMarkers.push(marker);
-                // addTestMarkerDecoration(terminal, marker, termWrap);
-                marker.onDispose(() => {
-                    const idx = termWrap.promptMarkers.indexOf(marker);
-                    if (idx !== -1) {
-                        termWrap.promptMarkers.splice(idx, 1);
-                    }
-                });
-            }
-            break;
+            rtInfo["shell:state"] = "ready";
+            globalStore.set(termWrap.shellIntegrationStatusAtom, "ready");
+            {
+                const marker = terminal.registerMarker(0);
+                if (marker) {
+                    termWrap.promptMarkers.push(marker);
+                    // addTestMarkerDecoration(terminal, marker, termWrap);
+                    marker.onDispose(() => {
+                        const idx = termWrap.promptMarkers.indexOf(marker);
+                        if (idx !== -1) {
+                            termWrap.promptMarkers.splice(idx, 1);
+                        }
+                    });
+                }
+            }
+            break;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
switch (cmd.command) {
case "A":
rtInfo["shell:state"] = "ready";
globalStore.set(termWrap.shellIntegrationStatusAtom, "ready");
const marker = terminal.registerMarker(0);
if (marker) {
termWrap.promptMarkers.push(marker);
// addTestMarkerDecoration(terminal, marker, termWrap);
marker.onDispose(() => {
const idx = termWrap.promptMarkers.indexOf(marker);
if (idx !== -1) {
termWrap.promptMarkers.splice(idx, 1);
}
});
}
break;
switch (cmd.command) {
case "A":
rtInfo["shell:state"] = "ready";
globalStore.set(termWrap.shellIntegrationStatusAtom, "ready");
{
const marker = terminal.registerMarker(0);
if (marker) {
termWrap.promptMarkers.push(marker);
// addTestMarkerDecoration(terminal, marker, termWrap);
marker.onDispose(() => {
const idx = termWrap.promptMarkers.indexOf(marker);
if (idx !== -1) {
termWrap.promptMarkers.splice(idx, 1);
}
});
}
}
break;
🧰 Tools
🪛 Biome (2.3.14)

[error] 277-277: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

🤖 Prompt for AI Agents
In `@frontend/app/view/term/osc-handlers.ts` around lines 273 - 288, The switch
case "A" leaks the const marker declaration into the switch scope; wrap the
entire case "A" body in braces to create a block scope so marker is block-local.
Specifically, enclose the code that sets rtInfo["shell:state"], calls
globalStore.set(termWrap.shellIntegrationStatusAtom, "ready"), declares const
marker = terminal.registerMarker(0), pushes onto termWrap.promptMarkers, and the
marker.onDispose handler inside { ... } so marker (and any other local
declarations) cannot be accessed by other cases.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@pkg/remote/sshclient.go`:
- Around line 45-62: ConnErrCode_UserTimeout is declared but never used; update
the timeout classification in ClassifyConnError to distinguish dial timeouts vs
user-input timeouts by returning ConnErrCode_UserTimeout for timeout errors that
originate from user interaction (e.g., context.DeadlineExceeded or timeouts from
prompt/read paths) and keep ConnErrCode_Dial for network dial timeouts, or if
you decide timeouts should not be distinguished remove the unused
ConnErrCode_UserTimeout constant; locate ClassifyConnError and the timeout
checks there and either change the returned constant to ConnErrCode_UserTimeout
for user-driven timeouts or delete the ConnErrCode_UserTimeout declaration.
🧹 Nitpick comments (3)
frontend/app/view/term/termwrap.ts (2)

164-173: Consider defining an interface for the TermWrap dependency passed to OSC handlers.

Passing the full TermWrap instance to handleOsc52Command and handleOsc16162Command tightly couples the new module back to this class. A narrow interface (e.g., only the atoms and methods the handlers actually use) would make the dependency explicit and the handlers easier to test independently. Fine for now as an incremental extraction.


293-295: Unsafe type assertion on backend data.

rtInfo["shell:state"] as ShellIntegrationStatus trusts the backend to return only "ready" or "running-command". If the backend ever returns an unexpected value, it will silently propagate. A runtime guard (e.g., checking against allowed values) would be safer, though the risk is low here.

pkg/remote/sshclient.go (1)

748-764: Consider classifying ssh.NewClientConn errors too.

Lines 751 and 758 correctly wrap dial errors with ConnErrCode_Dial. However, line 764 returns the error from ssh.NewClientConn without a coded error. This is the SSH handshake/auth negotiation phase — errors here will fall through to ClassifyConnError's string matching, which may or may not classify them accurately. If you're relying on the fallback classifier intentionally, this is fine; otherwise, wrapping with a more specific code here could improve classification accuracy.

Comment on lines +45 to +62
const (
ConnErrCode_ConfigParse = "config-parse"
ConnErrCode_ConfigDefault = "config-default"
ConnErrCode_ProxyDepth = "proxy-depth"
ConnErrCode_ProxyParse = "proxy-parse"
ConnErrCode_SecretStore = "secret-error"
ConnErrCode_SecretNotFound = "secret-notfound"
ConnErrCode_KnownHostsNone = "knownhosts-none"
ConnErrCode_KnownHostsFmt = "knownhosts-format"
ConnErrCode_Dial = "dial-error"
ConnErrCode_HostKeyRevoked = "hostkey-revoked"
ConnErrCode_HostKeyChanged = "hostkey-changed"
ConnErrCode_HostKeyVerify = "hostkey-verify"
ConnErrCode_UserCancelled = "user-cancelled"
ConnErrCode_UserTimeout = "user-timeout"
ConnErrCode_AuthFailed = "auth-failed"
ConnErrCode_Unknown = "unknown"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Check all usages of ConnErrCode_UserTimeout in the codebase
rg -n 'ConnErrCode_UserTimeout' --type=go

Repository: wavetermdev/waveterm

Length of output: 135


🏁 Script executed:

# Also check the sshclient.go file for timeout-related error handling
rg -n -B2 -A2 'timeout' pkg/remote/sshclient.go --type=go -i

Repository: wavetermdev/waveterm

Length of output: 1724


🏁 Script executed:

# Let's look at the ClassifyConnError function specifically around line 144
sed -n '121,150p' pkg/remote/sshclient.go

Repository: wavetermdev/waveterm

Length of output: 830


ConnErrCode_UserTimeout is defined but never used anywhere in the codebase.

The constant is declared at line 59 but has no assignment sites in this file or elsewhere. Meanwhile, ClassifyConnError (lines 144–145) classifies all timeout-related errors as ConnErrCode_Dial rather than ConnErrCode_UserTimeout. Either this constant should be removed if it's unused, or the timeout classification logic should be updated to distinguish between dial timeouts and user-input timeouts by using ConnErrCode_UserTimeout where appropriate.

🤖 Prompt for AI Agents
In `@pkg/remote/sshclient.go` around lines 45 - 62, ConnErrCode_UserTimeout is
declared but never used; update the timeout classification in ClassifyConnError
to distinguish dial timeouts vs user-input timeouts by returning
ConnErrCode_UserTimeout for timeout errors that originate from user interaction
(e.g., context.DeadlineExceeded or timeouts from prompt/read paths) and keep
ConnErrCode_Dial for network dial timeouts, or if you decide timeouts should not
be distinguished remove the unused ConnErrCode_UserTimeout constant; locate
ClassifyConnError and the timeout checks there and either change the returned
constant to ConnErrCode_UserTimeout for user-driven timeouts or delete the
ConnErrCode_UserTimeout declaration.

@sawka sawka merged commit facefdd into main Feb 11, 2026
6 checks passed
@sawka sawka deleted the sawka/job-events branch February 11, 2026 04:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant