Conversation
Polar Signals Profiling ResultsLatest Run
Previous Runs (9)
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 0.985x ➖ datafusion / vortex-file-compressed (0.985x ➖, 0↑ 0↓)
|
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.014x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.012x ➖, 0↑ 0↓)
datafusion / parquet (0.985x ➖, 2↑ 0↓)
datafusion / arrow (1.004x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (1.006x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.006x ➖, 0↑ 0↓)
duckdb / parquet (1.044x ➖, 0↑ 4↓)
duckdb / duckdb (1.002x ➖, 0↑ 1↓)
Full attributed analysis
|
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.968x ➖, 1↑ 0↓)
datafusion / vortex-compact (1.025x ➖, 0↑ 1↓)
datafusion / parquet (0.992x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.048x ➖, 0↑ 1↓)
duckdb / vortex-compact (0.997x ➖, 0↑ 0↓)
duckdb / parquet (0.993x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.017x ➖, 0↑ 3↓)
datafusion / vortex-compact (0.992x ➖, 0↑ 0↓)
datafusion / parquet (1.041x ➖, 0↑ 10↓)
duckdb / vortex-file-compressed (0.997x ➖, 2↑ 2↓)
duckdb / vortex-compact (0.985x ➖, 4↑ 4↓)
duckdb / parquet (0.999x ➖, 0↑ 2↓)
duckdb / duckdb (0.996x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.993x ➖, 4↑ 5↓)
datafusion / vortex-compact (0.780x ➖, 9↑ 2↓)
datafusion / parquet (1.286x ➖, 2↑ 14↓)
duckdb / vortex-file-compressed (1.058x ➖, 0↑ 2↓)
duckdb / vortex-compact (1.031x ➖, 0↑ 1↓)
duckdb / parquet (1.068x ➖, 0↑ 1↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.990x ➖, 0↑ 0↓)
datafusion / vortex-compact (0.992x ➖, 0↑ 0↓)
datafusion / parquet (0.999x ➖, 0↑ 0↓)
datafusion / arrow (0.985x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.999x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.992x ➖, 0↑ 0↓)
duckdb / parquet (0.995x ➖, 0↑ 0↓)
duckdb / duckdb (1.003x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Random AccessVortex (geomean): 0.895x ✅ unknown / unknown (0.951x ➖, 6↑ 0↓)
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (1.041x ➖, 0↑ 3↓)
duckdb / vortex-compact (0.943x ➖, 1↑ 0↓)
duckdb / parquet (0.960x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.123x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.103x ➖, 0↑ 4↓)
datafusion / parquet (1.170x ➖, 0↑ 6↓)
duckdb / vortex-file-compressed (1.109x ➖, 0↑ 2↓)
duckdb / vortex-compact (1.048x ➖, 0↑ 1↓)
duckdb / parquet (1.098x ➖, 0↑ 3↓)
Full attributed analysis
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.951x ➖, 5↑ 1↓)
datafusion / parquet (0.929x ➖, 9↑ 1↓)
duckdb / vortex-file-compressed (1.160x ❌, 2↑ 35↓)
duckdb / parquet (1.050x ➖, 0↑ 14↓)
duckdb / duckdb (1.059x ➖, 0↑ 11↓)
Full attributed analysis
|
Benchmarks: CompressionVortex (geomean): 1.100x ❌ unknown / unknown (1.090x ➖, 2↑ 20↓)
|
Benchmarks: FineWeb S3Verdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.979x ➖, 1↑ 1↓)
datafusion / vortex-compact (1.062x ➖, 1↑ 2↓)
datafusion / parquet (1.053x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (1.049x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.035x ➖, 0↑ 0↓)
duckdb / parquet (1.029x ➖, 0↑ 0↓)
Full attributed analysis
|
bfb7f6c to
2f94e47
Compare
Merging this PR will degrade performance by 15.34%
Performance Changes
Comparing Footnotes
|
d45d3ee to
187e742
Compare
187e742 to
d25093e
Compare
1600b75 to
ff9e7bc
Compare
There was a problem hiding this comment.
I like the direction a lot.
But I think we should follow more closely the conventions we already have for plugins, i.e. a vtable for Scheme with all the associated machinery and associated type for options.
I'd also like to think about how we can define compression graphs, or "pipelines" for compressing arrays of certain types.
It might also be worth replacing stats with AggregateFnRef. We're going to do this in the main vortex-array crate soon, and we may as well start doing stats this way within the compressor if we're going to change it a lot.
|
If we want to do that, we should do that in a followup PR after we merge this, as the So we will go through the process of merging this and then we can look at extending |
8f63097 to
c7f3b92
Compare
|
@claude please review |
This comment was marked as resolved.
This comment was marked as resolved.
| #[derive(Debug, Clone)] | ||
| pub struct ExecutionCtx { |
|
@a10y Fixed some issues in the latest commit, the things that are actually still wrong are detailed in the (updated) PR description. Everything else that claude said is not an actual concern. |
36e78a7 to
54ca2e6
Compare
a10y
left a comment
There was a problem hiding this comment.
let's get this in and iterate
|
I am checking the compression size regressions locally. I'm not worrying as much about the throughput regressions because I sort of believe we should have a separate compressor that optimizes for that instead, and also the regressions are quite small. |
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
54ca2e6 to
e4891d3
Compare
Summary: Extensible and Pluggable Compressor
Tracking Issue: #7216
You can see a lot of the details in the tracking issue.
This is a major step in supporting extension types as a first-class feature in Vortex. The entire compressor has been rewritten, see the tracking issue for full design details and motivation.
The new
vortex-compressorcrate extracts the encoding-agnostic compression framework fromvortex-btrblocks, inverting the dependency graph so that encoding crates can implement a singleSchemetrait and register themselves with the compressor. Additionally,vortex-btrblocksremains the "batteries-included" default compressor, and depends onvortex-compressor.The compression benchmark comment is here.
For reviewers: I would just look at the whole
vortex-compressorandvortex-btrblockscrates instead of the git diff since basically everything has changed.Changes
vortex-compressorcrate with unifiedSchemetraitvortex-btrblocksto depend onvortex-compressorvortex-btrblocks(re-exports)RunEndSchemeexclusion inrle.rsis broken and re-enable itTesting
Existing tests pass, so that's a good sign. I added a few new tests that check the newer parts of the compressor as well.