Conversation
Signed-off-by: blaginin <dima@spiraldb.com> Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: blaginin <dima@spiraldb.com>
| /// Return the count of non-null elements in an array. | ||
| /// | ||
| /// See [`Count`] for details. | ||
| pub fn count(array: &ArrayRef, ctx: &mut ExecutionCtx) -> VortexResult<u64> { |
There was a problem hiding this comment.
If this is counting things in an array, should it return a usize?
| false | ||
| } | ||
|
|
||
| fn accumulate( |
There was a problem hiding this comment.
We should be able to register a generic aggregate kernel to reduce count-non-null to be Array.validity().sum(), then we avoid decompressing all the data.
| } | ||
|
|
||
| fn partial_struct_dtype() -> DType { | ||
| DType::Struct( |
There was a problem hiding this comment.
Can you static LazyLock this entire dtype inside this function?
| pub struct MeanPartial { | ||
| sum: f64, | ||
| count: u64, | ||
| } |
There was a problem hiding this comment.
This will compute with a pretty large error?
There was a problem hiding this comment.
There was a problem hiding this comment.
you need only hold the partial mean and count
There was a problem hiding this comment.
You can incrementally push the next value into this partial, but you cannot combine two partials afaik?
There was a problem hiding this comment.
For combining you do.
There was a problem hiding this comment.
I thought in that formula there's even more division and hence more rounding errors?
I think there are some smart algorithms, but I haven't seen a simple implementation ever being a problem in df
There was a problem hiding this comment.
100% agree. Let's do what DataFusion odes
There was a problem hiding this comment.
DF sum is T and ours is f64
There was a problem hiding this comment.
Error comes from numbers of different scales being combined, not from number of OPS.
Since we are storing these on disk we cannot "just change it later".
There was a problem hiding this comment.
DF only supports AVG for floats, decimals, and durations. And coerces all floats to f64.
No description provided.