`Count` and `Mean` aggregates by blaginin · Pull Request #7201 · vortex-data/vortex

blaginin · 2026-03-29T21:23:15Z

No description provided.

Signed-off-by: blaginin <dima@spiraldb.com> Co-authored-by: Claude <noreply@anthropic.com>

Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: blaginin <dima@spiraldb.com>

gatesn · 2026-03-30T09:05:25Z

vortex-array/src/aggregate_fn/fns/count/mod.rs

+/// Return the count of non-null elements in an array.
+///
+/// See [`Count`] for details.
+pub fn count(array: &ArrayRef, ctx: &mut ExecutionCtx) -> VortexResult<u64> {


If this is counting things in an array, should it return a usize?

gatesn · 2026-03-30T09:08:11Z

vortex-array/src/aggregate_fn/fns/count/mod.rs

+        false
+    }
+
+    fn accumulate(


We should be able to register a generic aggregate kernel to reduce count-non-null to be Array.validity().sum(), then we avoid decompressing all the data.

gatesn · 2026-03-30T09:12:34Z

vortex-array/src/aggregate_fn/fns/mean/mod.rs

+}
+
+fn partial_struct_dtype() -> DType {
+    DType::Struct(


Can you static LazyLock this entire dtype inside this function?

joseph-isaacs

This should be decimal for decimals?

joseph-isaacs · 2026-03-30T09:34:05Z

vortex-array/src/aggregate_fn/fns/mean/mod.rs

+pub struct MeanPartial {
+    sum: f64,
+    count: u64,
+}


This will compute with a pretty large error?

$$\mu_n = \sum_{i=0}^n x_i/n \iff \mu_{n+1} = \mu_n + \frac{x_{n+1} - \mu_n}{1+n}$$

you need only hold the partial mean and count

You can incrementally push the next value into this partial, but you cannot combine two partials afaik?

For combining you do. $$\mu_{AB} = \frac{n_A \mu_A + n_B \mu_B}{n_A + n_B}$$

I thought in that formula there's even more division and hence more rounding errors?

I think there are some smart algorithms, but I haven't seen a simple implementation ever being a problem in df

100% agree. Let's do what DataFusion odes

DF sum is T and ours is f64

Error comes from numbers of different scales being combined, not from number of OPS.

Since we are storing these on disk we cannot "just change it later".

DF only supports AVG for floats, decimals, and durations. And coerces all floats to f64.

Count and Mean aggregates

6bafabe

Signed-off-by: blaginin <dima@spiraldb.com> Co-authored-by: Claude <noreply@anthropic.com>

blaginin self-assigned this Mar 29, 2026

validity cleanup

ee67d80

Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: blaginin <dima@spiraldb.com>

gatesn reviewed Mar 30, 2026

View reviewed changes

gatesn added the changelog/feature A new feature label Mar 30, 2026

joseph-isaacs requested changes Mar 30, 2026

View reviewed changes

Conversation

blaginin commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joseph-isaacs left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joseph-isaacs Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joseph-isaacs Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

blaginin commented Mar 29, 2026 •

edited

Loading

joseph-isaacs left a comment •

edited

Loading

joseph-isaacs Mar 30, 2026 •

edited

Loading

joseph-isaacs Mar 30, 2026 •

edited

Loading