Skip to content

fix: prevent channel monitor migration from overwriting newer state#74

Open
ovitrif wants to merge 4 commits intomainfrom
fix/channel-monitor-migration
Open

fix: prevent channel monitor migration from overwriting newer state#74
ovitrif wants to merge 4 commits intomainfrom
fix/channel-monitor-migration

Conversation

@ovitrif
Copy link
Collaborator

@ovitrif ovitrif commented Mar 9, 2026

Summary

  • Fixed channel monitor migration from FS store to KV store overwriting newer state. During migration, the code now checks if the KV store already has a channel monitor with a newer update_id before writing, preventing stale data from overwriting current state on repeated migrations/restarts.
  • Rewrote AGENTS.md testing documentation with correct syntax and setup requirements (unit tests, integration tests with macOS ulimit note, CLN/LND/VSS backend tests).
  • Bumped version to 0.7.0-rc.33 and regenerated bindings.

Test plan

  • cargo fmt --check — clean
  • cargo build — builds successfully
  • Bindings regenerated (Swift/Kotlin/Python)
  • xcframework checksum verified in Package.swift

Release

🤖 Generated with Claude Code

ovitrif and others added 4 commits March 9, 2026 22:22
During FS→KV store migration, the code now compares update_id before
writing and skips if the KV store already has a newer monitor. This
prevents stale migration data from clobbering current channel state
on repeated restarts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace incomplete testing docs with comprehensive coverage of unit tests,
integration tests (with macOS ulimit note), and CLN/LND/VSS backend tests
using correct RUSTFLAGS syntax.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Found 3 issues: 1 CLAUDE.md compliance issue and 2 logic issues with the fail-open error handling in the migration code.

# 0.7.0-rc.32 (Synonym Fork)
# 0.7.0-rc.33 (Synonym Fork)

## Bug Fixes
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLAUDE.md compliance: New entry should be under ## Synonym Fork Additions, not ## Bug Fixes

Per AGENTS.md (the source for CLAUDE.md):

All Synonym fork additions go under ONE ## Synonym Fork Additions subsection
New additions should be added at the TOP of the Synonym Fork Additions list

The new channel monitor migration fix entry belongs at the top of a ## Synonym Fork Additions subsection. If that subsection doesn't yet exist, it should be created. All Synonym-fork-specific changes are supposed to be collected there, not in generic ## Bug Fixes.

e
);
true
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fail-open on deserialization error is unsafe for Lightning channel monitors

When the KV store returns data but ChannelMonitor::read() fails to deserialize it, this returns true and proceeds to overwrite the existing KV entry with the FS migration data. A deserialization failure does not mean the existing data is absent or invalid — it could mean:

  • The KV store was written by a newer LDK version with a breaking format change (i.e., the KV data is newer)
  • A partial write corrupted the header bytes

In either case, overwriting with FS migration data risks reverting the monitor to a stale state. For Lightning, using stale channel monitor state means the node may not have revocation keys for recent commitment transactions, enabling the counterparty to claim funds via a justice transaction.

The correct behavior is fail-closed: return Err(BuildError::ReadFailed) and refuse to proceed. This forces the operator to investigate rather than silently proceeding with potentially older state.

Err(e) => {
    log_error!(
        logger,
        "Failed to deserialize existing monitor {}, refusing migration write to avoid overwriting potentially newer state: {:?}",
        monitor_key,
        e
    );
    return Err(BuildError::ReadFailed);
},

e
);
true
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fail-open on non-NotFound I/O errors is unsafe — especially for VSS backends

This arm triggers on any I/O error except NotFound (e.g., TimedOut, ConnectionReset, PermissionDenied, BrokenPipe). In all these cases the data may exist in the store — the read simply failed transiently. Returning true and overwriting with FS migration data can corrupt valid, current channel monitor state.

This is especially dangerous for VSS (Versioned Storage Service) backends where transient network errors are expected. A VSS node that has already fully migrated and is running normally could suffer a brief network hiccup during build() — triggering this path and rolling back all channel monitors to their FS-era state.

This is compounded by the fact that there is no one-shot migration sentinel written to mark migration as complete: if the caller passes migration data on every build() call (typical for apps that read FS on every startup), this fail-open path is exercised on every restart.

Recommend fail-closed here as well:

Err(e) => {
    log_error!(
        logger,
        "Failed to read existing monitor {}, refusing migration write to avoid overwriting potentially newer state: {}",
        monitor_key,
        e
    );
    return Err(BuildError::ReadFailed);
},

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant