[RNTuple] Implement I/O Performance Metrics: Sparseness, Randomness, and Transactions#20994
[RNTuple] Implement I/O Performance Metrics: Sparseness, Randomness, and Transactions#20994JasMehta08 wants to merge 1 commit intoroot-project:masterfrom
Conversation
jblomer
left a comment
There was a problem hiding this comment.
Thank you!
Before going into a detailed review, I see a few larger items that need to be addressed.
- The metrics should not be added directly as atomic integers to the RNTupleMetrics object. We have
RNTuplePerfCounterfor this. - Nice observation on the randomness metric. However, I would prefer doing this internally, e.g. excluding header and footer or having two metrics (with and without).
- I think it makes more sense to do the accounting in the
fReader; should save us to chase all the invocations ofReadBuffer.
|
@jblomer Thanks for the review. I have updated the PR to address your three main points: Counters: I replaced the Accounting: I moved all I/O accounting into Randomness Logic: As suggested, I handled the exclusion internally rather than via I added an internal The reader automatically switches to the "Analysis Phase" at the end of
|
|
While verifying the changes, I noticed a pre-existing issue with This results in 0 metrics during verification tests that snapshot the reader across a move. It likely doesn't affect production (where the reader owns the source), but it breaks the "Snapshot & Subtract" verification logic. I have not added a custom Move Constructor to fix this yet, as it felt like a separate architectural fix outside the scope of this metrics refactor. Let me know if you would prefer I include that fix here. |
|
I pushed a polish commit addressing 3 specific things that i noticed while reviewing:
|
|
I'm afraid this isn't going quite in the right direction. Take a look at |
fc79786 to
c92fc14
Compare
|
hey @jblomer, Thanks for the review, sorry about miss understanding the previous review. I moved the implementation to follow the please let me know if they are in the right directions, or should i look a bit deeper in the library. On a side note RNTuple seems very interesting to me, I'm interested in the low-level systems engineering side of this and learning about columnar data structures in general, could you let me know if there is anyway to deep dive and understand it even better (i did go through the root reference guide i could only find an introduction page. And i also went through the youtube video on the HSF youtube channel but it did not have a lot of details). please have a review and let me know if these changes are as desired. Thanks! |
c92fc14 to
5a8845d
Compare
|
Thank you! That looks better. One issue here is still that the concrete metrics are added to the As an introduction documentation, I recommend the write-ups in the doc folder of the RNTuple sources. Start with the |
|
I removed the concrete methods from RNTupleMetrics class. Please have a look and let me know if I should change anything. Note: I changed the
as having to write the whole name for randomness would be tedious. Both work identically. Once confirmed by you that everything has been implemented properly, I will add a tutorial and unit test (assuming that would be desired). Thanks! |
|
Thank you! I think we should only address metrics by their full name. I'm not convinced by the short name as it may result in name collisions. In any case, I think that particular functionality should go in a separate PR. |
|
Also, please rebase/squash your commits for a clean history. |
|
@jblomer Also other than that i will add a test for this as well as please let me know if i would have to change some documentation for it or not. I hope the other changes are good. I will ask for a review after making the changes suggested and also adding tests. Thanks! |
9e2507c to
ba15fe6
Compare
|
@jblomer Please let me know if i should change anything else. Thanks! |
| ROOT::Experimental::Detail::RNTupleAtomicCounter &fSumSkip; | ||
| ROOT::Experimental::Detail::RNTupleAtomicCounter &fTotalFileSize; | ||
| ROOT::Experimental::Detail::RNTupleCalcPerf &fRandomness; | ||
| ROOT::Experimental::Detail::RNTupleCalcPerf &fSparseness; |
There was a problem hiding this comment.
These counters only make sense for files, so should move to RPageSourceFile
| /// Flag to track if total file size has been set (set lazily on first page load when metrics are enabled) | ||
| bool fFileSizeSet = false; |
There was a problem hiding this comment.
I think we can use a negative file size as a flag and don't need the additional member.
tree/ntuple/src/RPageStorage.cxx
Outdated
| @@ -555,6 +542,30 @@ void ROOT::Internal::RPageSource::EnableDefaultMetrics(const std::string &prefix | |||
| } | |||
| } | |||
| return {false, -1.}; | |||
| }), | |||
| *fMetrics.MakeCounter<RNTupleAtomicCounter *>("sumSkip", "B", "cumulative seek distance"), | |||
There was a problem hiding this comment.
| *fMetrics.MakeCounter<RNTupleAtomicCounter *>("sumSkip", "B", "cumulative seek distance"), | |
| *fMetrics.MakeCounter<RNTupleAtomicCounter *>("szSkip", "B", "cumulative seek distance"), |
tree/ntuple/src/RPageStorage.cxx
Outdated
| @@ -555,6 +542,30 @@ void ROOT::Internal::RPageSource::EnableDefaultMetrics(const std::string &prefix | |||
| } | |||
| } | |||
| return {false, -1.}; | |||
| }), | |||
| *fMetrics.MakeCounter<RNTupleAtomicCounter *>("sumSkip", "B", "cumulative seek distance"), | |||
| *fMetrics.MakeCounter<RNTupleAtomicCounter *>("totalFileSize", "B", "total file size"), | |||
There was a problem hiding this comment.
| *fMetrics.MakeCounter<RNTupleAtomicCounter *>("totalFileSize", "B", "total file size"), | |
| *fMetrics.MakeCounter<RNTupleAtomicCounter *>("szFile", "B", "total file size"), |
| @@ -650,6 +646,10 @@ protected: | |||
| ROOT::Experimental::Detail::RNTupleCalcPerf &fBandwidthUnzip; | |||
| ROOT::Experimental::Detail::RNTupleCalcPerf &fFractionReadOverhead; | |||
| ROOT::Experimental::Detail::RNTupleCalcPerf &fCompressionRatio; | |||
| ROOT::Experimental::Detail::RNTupleAtomicCounter &fSumSkip; | |||
There was a problem hiding this comment.
| ROOT::Experimental::Detail::RNTupleAtomicCounter &fSumSkip; | |
| ROOT::Experimental::Detail::RNTupleAtomicCounter &fSzSkip; |
tree/ntuple/test/ntuple_metrics.cxx
Outdated
|
|
||
| TEST(Metrics, IOMetrics) | ||
| { | ||
| std::string rootFileName{"test_ntuple_io_metrics.root"}; |
There was a problem hiding this comment.
Not needed, you can ask the file guard for the path
tree/ntuple/src/RPageStorageFile.cxx
Outdated
| @@ -515,16 +505,23 @@ ROOT::Internal::RPageRef ROOT::Internal::RPageSourceFile::LoadPageImpl(ColumnHan | |||
| directReadBuffer = MakeUninitArray<unsigned char>(sealedPage.GetBufferSize()); | |||
| { | |||
| RNTupleAtomicTimer timer(fCounters->fTimeWallRead, fCounters->fTimeCpuRead); | |||
| fReader.ReadBuffer(directReadBuffer.get(), sealedPage.GetBufferSize(), | |||
| pageInfo.GetLocator().GetPosition<std::uint64_t>()); | |||
| auto offset = pageInfo.GetLocator().GetPosition<std::uint64_t>(); | |||
There was a problem hiding this comment.
| auto offset = pageInfo.GetLocator().GetPosition<std::uint64_t>(); | |
| const auto offset = pageInfo.GetLocator().GetPosition<std::uint64_t>(); |
tree/ntuple/src/RPageStorageFile.cxx
Outdated
| auto offset = pageInfo.GetLocator().GetPosition<std::uint64_t>(); | ||
| // Track seek distance | ||
| if (fLastOffset != 0) { | ||
| std::int64_t diff = static_cast<std::int64_t>(offset) - static_cast<std::int64_t>(fLastOffset); |
There was a problem hiding this comment.
| std::int64_t diff = static_cast<std::int64_t>(offset) - static_cast<std::int64_t>(fLastOffset); | |
| const std::int64_t diff = static_cast<std::int64_t>(offset) - static_cast<std::int64_t>(fLastOffset); |
tree/ntuple/src/RPageStorageFile.cxx
Outdated
| std::uint64_t distance = (diff >= 0) ? diff : -diff; | ||
| fCounters->fSumSkip.Add(distance); |
There was a problem hiding this comment.
| std::uint64_t distance = (diff >= 0) ? diff : -diff; | |
| fCounters->fSumSkip.Add(distance); | |
| fCounters->fSumSkip.Add(std::abs(diff)); |
tree/ntuple/src/RPageStorageFile.cxx
Outdated
| // Track seek distance based on page location even when using cluster pool | ||
| auto offset = pageInfo.GetLocator().GetPosition<std::uint64_t>(); | ||
| if (fLastOffset != 0) { | ||
| std::int64_t diff = static_cast<std::int64_t>(offset) - static_cast<std::int64_t>(fLastOffset); | ||
| std::uint64_t distance = (diff >= 0) ? diff : -diff; | ||
| fCounters->fSumSkip.Add(distance); | ||
| } | ||
| fLastOffset = offset + sealedPage.GetBufferSize(); |
There was a problem hiding this comment.
This is wrong, we need to track the actual read requests in LoadClusters()
a47c1e3 to
cf6fb09
Compare
cf6fb09 to
2e564f6
Compare
|
@jblomer Thank you for the review, I have made all the changes as suggested, I have used a member variable to store the file size Thank you! |
This Pull request:
improves the existing Performance Metrics in RNTuple.
Changes or fixes:
This PR addresses the issue that had been raised for the improvements of metrics. It implements,
Implementation Details:
These have been implemented through the introduction of transient, thread-safe members in RNTupleMetrics,
std::atomicstd::uint64_t fSumSkip: To track total seek distance.
std::atomicstd::uint64_t fExplicitBytesRead: To track payload bytes.
std::atomicstd::uint64_t fTransactions: To track I/O operations.
I have changed RPageStorageFile to update these counters during ReadBuffer, ReadV, and LoadClusters operations. The use of std::atomic ensures thread safety with negligible performance overhead on the I/O path.
Reset Functionality & Accuracy:
I added a Reset() method to RNTupleMetrics. This is needed for obtaining accurate Randomness metrics for the analysis loop.
Without Reset(): The metric is dominated by the initial file seek (Header → Footer → Header), resulting in a Randomness score > 2.0.
With Reset(): Users can clear the counters after initialization, isolating the steady-state performance of their event loop.
Verification:
Added a new unit test TEST(Metrics, IOMetrics) in tree/ntuple/test/ntuple_metrics.cxx to verify the logic and the Reset() behavior.
Also tested locally using this code
Click to view Manual Verification Code & Output
This PR fixes #20853