-
Notifications
You must be signed in to change notification settings - Fork 1.9k
fix: Add integer check for bitwise coercion #20241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Acfboy -- this looks good to me. The only thing I think is needed is a slt test and it will be good to go
| } | ||
|
|
||
| if left_type == right_type { | ||
| if left_type == right_type && left_type.is_integer() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add some .slt tests too? For example, the current error looks like this:
> select 1.2 & 4.4;
Error during planning: Data type Float64 not supported for binary operation 'bitwise_and' on dyn arrays
>There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, thanks.
| } | ||
|
|
||
| if left_type == right_type { | ||
| if left_type == right_type && left_type.is_integer() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about dictionary encoded integers ? Should they be supported too ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the oversight. I agree that identical integer Dictionaries should pass.
However, one thing puzzles me:
- In the original implementation, while identical Dictionary types passed the check, Dictionary(Int8, Int32) & Dictionary(Int32, Int32) would error out immediately.
- In contrast, operations like Plus allow different Dictionary index types.
I suspect this is because Plus unpack the dictionary and coerces them to their internal value type, which triggers a cast back to that type during rewrite phase ((Dict(Int8, Int32) + (Dict(int32, int32)) --cast--> (Int32 + Int32)). While bitwise operations might aim for direct execution between Dictionaries to preserve performance.
But we could still coerce their internal value types ((Dict(Int8, Int32) & (Dict(int32, int32)) --cast--> (Dict(Int32, Int32) & (Dict(int32, int32)). I'm curious why this wasn't implemented before? Is it another oversight that I can fix it in another pr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let is_integer_dictionary =
matches!(left_type, Dictionary(_, value_type) if value_type.is_integer());
if left_type == right_type && (left_type.is_integer() || is_integer_dictionary) {
return Some(left_type.clone());
}I've added checks for Dictionary cases, the behavior remains consistent with original version.
|
|
||
| Ok(()) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe add a test case for Decimals too, since they are also numeric types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests for Decimals and Dictionary are added, thanks.
3bab6c8 to
5fb2647
Compare
…on and add Decimal/Dictionary test cases
5fb2647 to
cdcd8db
Compare
Which issue does this PR close?
N/A
Rationale for this change
In the original codebase, bitwise_coercion was implemented as follows:
This causes any identical types—such as floats—to pass the check during the logical planning stage. The error only surfaces much later when the arrow kernel attempts execution. This appears to be a minor oversight by the original author.
What changes are included in this PR?
Are these changes tested?
Yes, a new unit test is added, and all existing tests passed.
Are there any user-facing changes?
No.