Skip to content

SOLR-18098: Fix replication failure for files with exact MB sizes#4205

Open
shubhamranjan wants to merge 1 commit intoapache:mainfrom
shubhamranjan:main
Open

SOLR-18098: Fix replication failure for files with exact MB sizes#4205
shubhamranjan wants to merge 1 commit intoapache:mainfrom
shubhamranjan:main

Conversation

@shubhamranjan
Copy link

https://issues.apache.org/jira/browse/SOLR-18098

Description

Replication fails with EOFException when transferring files whose size is an exact multiple of PACKET_SZ (1 MB). For example, replicating a file that is exactly 1 MB, 2 MB, etc. causes the follower to crash.

Solution

The root cause is in IndexFetcher.FileFetcher.fetchPackets(). The replication packet protocol has three packet types:

  1. Data packet: int(size) + long(checksum) + byte[size]
  2. Zero-length data packet: int(0) + long(checksum) — sent when the last chunk fills exactly PACKET_SZ
  3. EOF marker: int(0) — no checksum follows

The old code treated any packetSize == 0 as a loop-continue, skipping the checksum at step 2. Those 8 unread checksum bytes were then interpreted as the next packet size → garbage value → EOFException.

The fix reorders fetchPackets() to:

  1. Detect the EOF marker (size=0 and fis.peek() == -1)
  2. Read the checksum for all data packets, including zero-length ones
  3. Skip zero-length data packets only after consuming their checksum

AI Disclosure: Claude (Anthropic) was used as an aid during diagnosis and development — specifically for analyzing the packet protocol interaction between DirectoryFileStream.write() and fetchPackets(), reasoning through the checksum read misalignment, and drafting test cases. All changes were reviewed, verified, and refined by a human (me) before submission.

Tests

Added IndexFetcherPacketProtocolTest with 18 unit tests that exercise the packet protocol between DirectoryFileStream (sender) and FileFetcher.fetchPackets (receiver) in isolation:

  • Exact multiples of PACKET_SZ: 1 MB, 2 MB, 3 MB, 63 MB
  • Non-multiples: empty, 1 byte, 100 bytes, 100 KB, 512 KB
  • Boundary cases: PACKET_SZ ± 1 byte, 1.5× PACKET_SZ, 2× PACKET_SZ ± 1
  • Error handling: checksum mismatch detection
  • Buffer resize: large multi-packet file (5 MB + 12345 bytes)
  • Successive transfers: multiple exact-size files in sequence

Run with: ./gradlew :solr:core:test --tests "IndexFetcherPacketProtocolTest"

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.

@github-actions github-actions bot added the tests label Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant