fix: prevent channel monitor migration from overwriting newer state#74
fix: prevent channel monitor migration from overwriting newer state#74
Conversation
During FS→KV store migration, the code now compares update_id before writing and skips if the KV store already has a newer monitor. This prevents stale migration data from clobbering current channel state on repeated restarts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace incomplete testing docs with comprehensive coverage of unit tests, integration tests (with macOS ulimit note), and CLN/LND/VSS backend tests using correct RUSTFLAGS syntax. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| # 0.7.0-rc.32 (Synonym Fork) | ||
| # 0.7.0-rc.33 (Synonym Fork) | ||
|
|
||
| ## Bug Fixes |
There was a problem hiding this comment.
CLAUDE.md compliance: New entry should be under ## Synonym Fork Additions, not ## Bug Fixes
Per AGENTS.md (the source for CLAUDE.md):
All Synonym fork additions go under ONE
## Synonym Fork Additionssubsection
New additions should be added at the TOP of the Synonym Fork Additions list
The new channel monitor migration fix entry belongs at the top of a ## Synonym Fork Additions subsection. If that subsection doesn't yet exist, it should be created. All Synonym-fork-specific changes are supposed to be collected there, not in generic ## Bug Fixes.
| e | ||
| ); | ||
| true | ||
| }, |
There was a problem hiding this comment.
Fail-open on deserialization error is unsafe for Lightning channel monitors
When the KV store returns data but ChannelMonitor::read() fails to deserialize it, this returns true and proceeds to overwrite the existing KV entry with the FS migration data. A deserialization failure does not mean the existing data is absent or invalid — it could mean:
- The KV store was written by a newer LDK version with a breaking format change (i.e., the KV data is newer)
- A partial write corrupted the header bytes
In either case, overwriting with FS migration data risks reverting the monitor to a stale state. For Lightning, using stale channel monitor state means the node may not have revocation keys for recent commitment transactions, enabling the counterparty to claim funds via a justice transaction.
The correct behavior is fail-closed: return Err(BuildError::ReadFailed) and refuse to proceed. This forces the operator to investigate rather than silently proceeding with potentially older state.
Err(e) => {
log_error!(
logger,
"Failed to deserialize existing monitor {}, refusing migration write to avoid overwriting potentially newer state: {:?}",
monitor_key,
e
);
return Err(BuildError::ReadFailed);
},| e | ||
| ); | ||
| true | ||
| }, |
There was a problem hiding this comment.
Fail-open on non-NotFound I/O errors is unsafe — especially for VSS backends
This arm triggers on any I/O error except NotFound (e.g., TimedOut, ConnectionReset, PermissionDenied, BrokenPipe). In all these cases the data may exist in the store — the read simply failed transiently. Returning true and overwriting with FS migration data can corrupt valid, current channel monitor state.
This is especially dangerous for VSS (Versioned Storage Service) backends where transient network errors are expected. A VSS node that has already fully migrated and is running normally could suffer a brief network hiccup during build() — triggering this path and rolling back all channel monitors to their FS-era state.
This is compounded by the fact that there is no one-shot migration sentinel written to mark migration as complete: if the caller passes migration data on every build() call (typical for apps that read FS on every startup), this fail-open path is exercised on every restart.
Recommend fail-closed here as well:
Err(e) => {
log_error!(
logger,
"Failed to read existing monitor {}, refusing migration write to avoid overwriting potentially newer state: {}",
monitor_key,
e
);
return Err(BuildError::ReadFailed);
},
Summary
update_idbefore writing, preventing stale data from overwriting current state on repeated migrations/restarts.ulimitnote, CLN/LND/VSS backend tests).0.7.0-rc.33and regenerated bindings.Test plan
cargo fmt --check— cleancargo build— builds successfullyRelease
🤖 Generated with Claude Code