SOLR-18098: Fix replication failure for files with exact MB sizes#4205
Open
shubhamranjan wants to merge 1 commit intoapache:mainfrom
Open
SOLR-18098: Fix replication failure for files with exact MB sizes#4205shubhamranjan wants to merge 1 commit intoapache:mainfrom
shubhamranjan wants to merge 1 commit intoapache:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
https://issues.apache.org/jira/browse/SOLR-18098
Description
Replication fails with
EOFExceptionwhen transferring files whose size is an exact multiple ofPACKET_SZ(1 MB). For example, replicating a file that is exactly 1 MB, 2 MB, etc. causes the follower to crash.Solution
The root cause is in
IndexFetcher.FileFetcher.fetchPackets(). The replication packet protocol has three packet types:int(size) + long(checksum) + byte[size]int(0) + long(checksum)— sent when the last chunk fills exactlyPACKET_SZint(0)— no checksum followsThe old code treated any
packetSize == 0as a loop-continue, skipping the checksum at step 2. Those 8 unread checksum bytes were then interpreted as the next packet size → garbage value →EOFException.The fix reorders
fetchPackets()to:size=0andfis.peek() == -1)AI Disclosure: Claude (Anthropic) was used as an aid during diagnosis and development — specifically for analyzing the packet protocol interaction between
DirectoryFileStream.write()andfetchPackets(), reasoning through the checksum read misalignment, and drafting test cases. All changes were reviewed, verified, and refined by a human (me) before submission.Tests
Added
IndexFetcherPacketProtocolTestwith 18 unit tests that exercise the packet protocol betweenDirectoryFileStream(sender) andFileFetcher.fetchPackets(receiver) in isolation:Run with:
./gradlew :solr:core:test --tests "IndexFetcherPacketProtocolTest"Checklist
Please review the following and check all that apply:
mainbranch../gradlew check.