Skip to content

PatchedArray: basics and wiring#7002

Merged
a10y merged 39 commits intodevelopfrom
aduffy/patched-array
Mar 31, 2026
Merged

PatchedArray: basics and wiring#7002
a10y merged 39 commits intodevelopfrom
aduffy/patched-array

Conversation

@a10y
Copy link
Copy Markdown
Contributor

@a10y a10y commented Mar 17, 2026

Summary

This is the first PR in a series addressing the PatchedArray RFC: vortex-data/rfcs#27

This PR adds a new PatchedArray array variant, which is slated to only be used with FastLanes array types. The design is largely documented in the RFC, but briefly

  • Wraps another child array
  • Holds a buffer of offsets and patch values. They are both sorted by chunk/lane so there is efficient random as well as bulk access for CUDA

We are able to pushdown the following at reduce time:

  • Filters pushes through to the child, and we trim the patch indices to the range of chunks covered by the filter
  • Slicing that reduces the chunk range
  • Compare always pushes through to the child
  • Take pushes through to the inner then constructs a hashmap of the patches

There will be follow ups to add the wiring into CUDA and to update how BitPacked and ALP arrays are written.

Testing

There are unit tests for all of the reducers and kernels

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Mar 17, 2026

Merging this PR will degrade performance by 15.59%

⚡ 9 improved benchmarks
❌ 1 regressed benchmark
✅ 1096 untouched benchmarks
⏩ 1522 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_bool_canonical_into[(1000, 10)] 71.6 µs 84.8 µs -15.59%
Simulation new_bp_prim_test_between[i16, 32768] 146 µs 132.2 µs +10.46%
Simulation new_bp_prim_test_between[i32, 16384] 117.5 µs 103.2 µs +13.8%
Simulation take_10k_contiguous 331.2 µs 272.6 µs +21.49%
Simulation new_bp_prim_test_between[i32, 32768] 179.4 µs 150.6 µs +19.14%
Simulation take_10k_random 270.4 µs 212.2 µs +27.44%
Simulation new_bp_prim_test_between[i64, 32768] 242.3 µs 183.7 µs +31.91%
Simulation new_bp_prim_test_between[i64, 16384] 149 µs 119.7 µs +24.51%
Simulation new_alp_prim_test_between[f64, 16384] 154.7 µs 133.1 µs +16.26%
Simulation old_alp_prim_test_between[f32, 32768] 448.7 µs 402.9 µs +11.38%

Comparing aduffy/patched-array (f3d9155) with develop (5ce08ed)

Open in CodSpeed

Footnotes

  1. 1522 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@a10y a10y added the changelog/feature A new feature label Mar 18, 2026
@a10y a10y marked this pull request as ready for review March 18, 2026 21:21
@a10y a10y requested a review from joseph-isaacs March 18, 2026 21:21
@a10y a10y force-pushed the aduffy/patched-array branch from c3d7434 to 6c1d7aa Compare March 19, 2026 13:24
@a10y a10y force-pushed the aduffy/patched-array branch from 860cc64 to 042394b Compare March 24, 2026 19:17
@a10y a10y force-pushed the aduffy/patched-array branch 4 times, most recently from df8dbf5 to 862d814 Compare March 26, 2026 19:54
@a10y
Copy link
Copy Markdown
Contributor Author

a10y commented Mar 27, 2026

@claude review please

@claude

This comment was marked as resolved.

@a10y
Copy link
Copy Markdown
Contributor Author

a10y commented Mar 27, 2026

@claude i have addressed your main comments. please give another review.

@claude

This comment was marked as resolved.

@a10y a10y force-pushed the aduffy/patched-array branch from daf258f to 578d83d Compare March 27, 2026 14:35
@a10y
Copy link
Copy Markdown
Contributor Author

a10y commented Mar 27, 2026

@claude please review

@claude

This comment was marked as resolved.

@a10y
Copy link
Copy Markdown
Contributor Author

a10y commented Mar 27, 2026

@claude PR is ready for another review

@claude

This comment was marked as resolved.

@a10y

This comment was marked as resolved.

@a10y a10y force-pushed the aduffy/patched-array branch from 8648244 to a9ce27f Compare March 27, 2026 19:33
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add a few more of these with different types? e.g. string?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Executing anything other than primitive would fail

/// Number of lanes the patch indices and values have been split into. Each of the `n_chunks`
/// of 1024 values is split into `n_lanes` lanes horizontally, each lane having 1024 / n_lanes
/// values that might be patched.
pub(super) n_lanes: usize,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this too

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I think we need to store incase we change the default ptype -> lane mapping in the future

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this kind of change would require new arrays. The data layout would change

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as you have the n_lanes saved you can always access the data without issue

}

#[derive(Clone, prost::Message)]
pub struct PatchedMetadata {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you can pack this into a u128?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, is there a reason to do that over struct? can protobuf even encode u128?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean using less and smaller fields

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed some fields and packed the rest into a single uint64 packed field

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @a10y this was on me. I was thinking that we can have 4 fields that total u128 in size. I don't think we need to do this manual packing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted this the fields are now separated

@a10y
Copy link
Copy Markdown
Contributor Author

a10y commented Mar 30, 2026

@claude review please

@claude

This comment was marked as resolved.

@a10y
Copy link
Copy Markdown
Contributor Author

a10y commented Mar 30, 2026

I'm pretty sure the CodSpeed results are just noise since the PatchedArray isn't wired into anything here

a10y added 20 commits March 31, 2026 12:18
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@joseph-isaacs
Copy link
Copy Markdown
Contributor

Make sure when we wire this in we make it an unstable encoding

session.register(Delta);
session.register(FoR);
session.register(FSST);
session.register(Patched);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep the old patches for a while and have this as an unstable_encoding

@joseph-isaacs
Copy link
Copy Markdown
Contributor

You will also need to replace with children (with with_slots)

Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y force-pushed the aduffy/patched-array branch from ad0eaa7 to f3d9155 Compare March 31, 2026 16:56
@a10y
Copy link
Copy Markdown
Contributor Author

a10y commented Mar 31, 2026

done

@a10y a10y merged commit faf98f2 into develop Mar 31, 2026
107 of 110 checks passed
@a10y a10y deleted the aduffy/patched-array branch March 31, 2026 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants