Skip to content

refactor: store aggregated payloads in memory#181

Closed
MegaRedHand wants to merge 1 commit intomainfrom
improve-aggregate-payload-handling
Closed

refactor: store aggregated payloads in memory#181
MegaRedHand wants to merge 1 commit intomainfrom
improve-aggregate-payload-handling

Conversation

@MegaRedHand
Copy link
Collaborator

No description provided.

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

🤖 Kimi Code Review

Review Summary

This PR migrates aggregated payload storage from RocksDB to in-memory HashMaps with Mutex protection. The change is well-motivated for performance but introduces several critical issues.

Critical Issues

  1. Race Condition in update_safe_target (line 101-102)

    • The function collects keys from both maps into a HashSet, but between the two iterator calls the maps could be modified by another thread, violating the lock ordering invariant.
  2. Iterator Invalidation in produce_block_with_signatures (line 769)

    • Collects known_payloads into a Vec but then iterates over it while calling extract_latest_attestations which may lock the same map, creating potential deadlocks.
  3. Missing Error Handling in Mutex Operations

    • All Mutex::lock().expect("lock") calls will panic on poisoned mutex. Should use proper error handling or recovery.
  4. Memory Growth Concern

    • No evidence of size limits or memory pressure handling for the in-memory maps. Could grow unbounded in long-running nodes.

Security & Consensus Issues

  1. Atomicity Violation in promote_new_aggregated_payloads (lines 947-954)

    • The promotion operation is not atomic. If the process crashes between draining new_ and extending known_, payloads could be lost, affecting fork choice weights.
  2. State Consistency

    • The in-memory maps are not persisted, creating potential inconsistency between disk state and fork choice weights after crashes.

Code Quality Issues

  1. Inefficient Collection Patterns

    • Multiple functions collect iterators into Vecs only to immediately iterate again (e.g., iter_known_aggregated_payloads returns Vec instead of iterator).
  2. Test Boilerplate Duplication

    • Lines 1158-1325: Each test manually constructs the Store with the new fields. Should use a helper function.
  3. Documentation Inconsistency

    • CLAUDE.md mentions "lock ordering invariant" but the code doesn't enforce it consistently.

Recommendations

  1. Fix race conditions by acquiring both locks in the correct order before any operations
  2. Add memory limits to the in-memory maps
  3. Implement proper error handling for poisoned mutexes
  4. Consider persistence for the aggregated payloads to maintain consistency
  5. Add metrics for map sizes to monitor memory usage

Automated review by Kimi (Moonshot AI) · custom prompt

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

🤖 Codex Code Review

  1. High: DB upgrade will fail hard with existing data (no migration path).
    Removing the two aggregated-payload CFs from ALL_TABLES means open_cf_descriptors will reject DBs that still contain those CFs, so nodes cannot start after upgrading unless operators wipe state. This is an operational and availability regression for production nodes.
    References: tables.rs:30, rocksdb.rs:39, rocksdb.rs:45.
    Suggested change: keep legacy CFs open for one release (or list/drop unknown CFs on open), then migrate/drop safely.

  2. High: fork-choice attestation memory is now ephemeral, so restart drops latest messages.
    known_aggregated_payloads/new_aggregated_payloads are initialized empty and never rehydrated from disk. After restart, LMD-GHOST weights/safe-target and block production attestations are computed from an empty (or partially rebuilt) set until new gossip arrives. That can produce weaker/incorrect head choice behavior immediately post-restart.
    References: store.rs:148, store.rs:285, blockchain store.rs:101, blockchain store.rs:769.
    Suggested change: persist minimal latest-message state, or rebuild it deterministically from recent canonical blocks on startup before fork-choice use.

  3. Medium: increased RAM-DoS/perf risk from unbounded in-memory payload maps and full clones.
    Inserts append indefinitely until finalization pruning, and reads clone the full map/value vectors under mutex. Under high attestation volume or delayed finalization, this can cause memory pressure and contention in consensus-critical paths.
    References: store.rs:938, store.rs:947, store.rs:924, store.rs:517.
    Suggested change: enforce per-validator/per-slot bounds or dedup policy, and avoid full-map cloning for hot-path reads.

I could not run cargo check in this environment because rustup failed to create temp files under /home/runner/.rustup/tmp (permission denied), so this review is static/diff-based.


Automated review by OpenAI Codex · custom prompt

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

🤖 Claude Code Review

Now I have enough context to write the review. Let me compile my findings.


PR 181 Review: refactor: store aggregated payloads in memory

Summary

The PR replaces two RocksDB column families (LatestNewAggregatedPayloads, LatestKnownAggregatedPayloads) with in-memory Arc<Mutex<HashMap<...>>> fields on Store. This eliminates SSZ encode/decode round-trips and significantly simplifies the promotion, pruning, and insert paths. The overall direction is sound.


Issues

Dead SSZ derives on StoredAggregatedPayload

crates/storage/src/types.rs:32

#[derive(Debug, Clone, ssz::Encode, ssz::Decode)]
pub struct StoredAggregatedPayload { ... }

After this PR, StoredAggregatedPayload is no longer serialized to or deserialized from bytes — it lives entirely in-memory. The ssz::Encode and ssz::Decode derives are now dead code. Depending on whether rustc / clippy emits unused_imports here, this may or may not warn, but it should be cleaned up.


Misleading lock-ordering comment in pruning

crates/storage/src/store.rs:401–403

// Lock ordering: known before new (field declaration order)
Self::prune_aggregated_payload_map(&self.known_aggregated_payloads, finalized.slot);
Self::prune_aggregated_payload_map(&self.new_aggregated_payloads, finalized.slot);

prune_aggregated_payload_map acquires and fully releases the Mutex before returning. These two calls never hold both locks simultaneously, so there is no ordering constraint for deadlock prevention here. The "lock ordering" comment is only meaningful in promote_new_aggregated_payloads where both locks are held at once. The comment here is misleading and can be safely removed or replaced with a simpler note.


Method names still use iter_ prefix but return Vec

crates/storage/src/store.rs:819–864

The doc comments were correctly updated ("Returns all..." instead of "Iterates over..."), but the method names iter_known_aggregated_payloads, iter_new_aggregated_payloads, iter_known_aggregated_payload_keys, iter_new_aggregated_payload_keys still imply lazy iteration. Callers now always allocate a Vec up front. Names like get_known_aggregated_payloads or collect_known_aggregated_payloads would be more accurate. This is a minor naming inconsistency, but it's notable since all call sites needed .into_iter() added to compensate.


Clone under Mutex lock with potentially large payloads

crates/storage/src/store.rs:924–931

fn iter_aggregated_payloads_from(
    map: &Mutex<AggregatedPayloadMap>,
) -> Vec<(SignatureKey, Vec<StoredAggregatedPayload>)> {
    map.lock()
        .expect("lock")
        .iter()
        .map(|(k, v)| (*k, v.clone()))
        .collect()  // <-- Mutex held for the entire clone
}

AggregatedSignatureProof contains proof_data: ByteList<U1048576> — up to 1 MiB per proof. The full clone of all known payloads happens while the Mutex is held, blocking all concurrent readers and writers for the duration. In the old RocksDB implementation this was similarly "blocking" via a read snapshot, but here it blocks other Mutex acquirers rather than I/O. For a small validator set this is likely fine, but it's worth noting.

A mitigation would be to release the lock before cloning (e.g., collect raw pointers/references, or change callers to work with the guard directly). For the current scope this is acceptable, but the concern scales with proof size × validator count.


&mut self is now superfluous for insert methods

crates/storage/src/store.rs:831–884

All insert_known_aggregated_payload, insert_new_aggregated_payload, and batch variants take &mut self, but the actual work is done via Arc<Mutex<...>> which provides interior mutability — &self would suffice. The &mut self was correct before (needed for the RocksDB write batch path) but is now overly restrictive. This is a carried-over API that's now misleading. Not a bug since Store is typically held behind &mut in the blockchain actor anyway, but worth noting for future callers.


What Works Well

  • promote_new_aggregated_payloads is a clean, obvious drain-and-extend — much better than the old read-deserialize-merge-serialize-delete-put cycle.
  • prune_aggregated_payload_map using retain on nested vecs is elegant and correct.
  • Lock ordering is consistently maintained: known before new in promote_new_aggregated_payloads (the only site that holds both simultaneously), and single-lock paths are all fine.
  • CLAUDE.md is kept accurate with the new ephemeral/in-memory framing and the RocksDB wipe note for operators upgrading from old databases.
  • The produce_block_with_signatures change (let known_payloads = store.iter_known_aggregated_payloads() removing the redundant : Vec<_>) is a nice cleanup.

Automated review by Claude (Anthropic) · custom prompt

@MegaRedHand MegaRedHand closed this Mar 4, 2026
@MegaRedHand MegaRedHand deleted the improve-aggregate-payload-handling branch March 4, 2026 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant