Gene.bordegaray/2026/02/dyn filter partition indexed #20246

gene-bordegaray · 2026-02-09T20:57:34Z

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

NGA-TRAN

The PR makes sense. My main comments:

You may add document about pruning and filtering and why we need partition-index if not yet documented somewhere
I think there is a typo in test data
I am unclear whether recursively searching RepartitionExec is the best strategy and whether it will introduce new bug. Maybe some examples and explanation will help.
I would ask someone that know dynamic filtering well to review this. Adrian and Lia? Maybe they will help explain the recursive walk, too

datafusion/sqllogictest/test_files/preserve_file_partitioning.slt

NGA-TRAN · 2026-02-09T21:33:08Z

datafusion/physical-plan/src/joins/hash_join/exec.rs

+    /// Determines whether partition-index routing should be used instead of CASE hash routing.
+    ///
+    /// Partition-index routing is enabled when:
+    /// 1. The join is in `Partitioned` mode


I wonder whether adding a comment here saying we do not have problem with CollectLeft with mamy partitions on the probe side is useful or not. It is because there is only one hash table and it will be used for pruning and filtering to all partitions of the probe side

I forgot to add this, will do 👀

NGA-TRAN · 2026-02-09T21:35:43Z

datafusion/physical-plan/src/joins/hash_join/exec.rs

+            }
+        }
+        false
+    }


In wonder if this is the right walk. Should we only check that there is RepartitionExec right before the join? Would we introduce bugs here? Maybe drawing some examples on paper will help you know whether this is correct.

No I don't believe so because there can be many operators between the join and the DataSourceExec. All that matters is seeing if there is some RepartitionExec in between (we then want to use CASE) or not.

I agree there can be many operators between the join and the DataSourceExec but it is always the case that if we find any RepartitionExec, it will be for the join? What happens if that repartition is for group-by not the join?

I do not know the details of how dynamic filtering is implemented but this recursive walk worries me. Is there a way to identify that the RepartitionExec is for the join? E.g. it is repartitioned on the join key?

I wonder if there is a simpler way to know if we are preserving file partitioning, if we are preserving file partitioning I'd say we should store this optimizer decision in the HashJoinExec node instead of recursing through the plan, similar to how we store the PartitionMode in HashJoinExec to make decisions during execution. wdyt?

I can look into this, but at a furst glance I think this is a great suggestion

I added the logic in enforce_distribution.rs but it is a bit more involved than the decision using a parititoned hash join. I have added documentation with explanation of cases and the method used

datafusion/physical-plan/src/joins/hash_join/exec.rs

NGA-TRAN · 2026-02-09T21:55:19Z

datafusion/core/tests/physical_optimizer/filter_pushdown.rs

+
+    // Probe side: each partition has matching and non-matching rows.
+    //   Partition 0: ("aa","ba",10.0) matches p0, ("zz","zz",20.0) does not match p0
+    //   Partition 1: ("zz","zz",30.0) matches p1, ("aa","ba",40.0) does not match p1


For your tests, I think these data is ok. However, it does not clear in the context of partitioned hash join. You may want to have data that clearly define partitions for both build and probe sides and make build side smaller and scatter so you can filter data from the probe side

datafusion/physical-plan/src/joins/hash_join/shared_bounds.rs

…umentation :

gene-bordegaray added 2 commits February 8, 2026 14:20

add partition indexed dyn filtering when no repartition

94f6864

fix repartition in nested children, and combine dyn filters

c2095bf

github-actions bot added physical-expr Changes to the physical-expr crates core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate labels Feb 9, 2026

NGA-TRAN reviewed Feb 9, 2026

View reviewed changes

LiaCastaneda reviewed Feb 10, 2026

View reviewed changes

datafusion/physical-plan/src/joins/hash_join/shared_bounds.rs Show resolved Hide resolved

move partition index detection to planning time and add alignment doc…

d97d66e

…umentation :

github-actions bot added documentation Improvements or additions to documentation optimizer Optimizer rules common Related to common crate proto Related to proto crate labels Feb 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gene.bordegaray/2026/02/dyn filter partition indexed #20246

Gene.bordegaray/2026/02/dyn filter partition indexed #20246

gene-bordegaray commented Feb 9, 2026

Uh oh!

NGA-TRAN left a comment

Uh oh!

Uh oh!

NGA-TRAN Feb 9, 2026

Uh oh!

gene-bordegaray Feb 11, 2026

Uh oh!

NGA-TRAN Feb 9, 2026

Uh oh!

gene-bordegaray Feb 9, 2026

Uh oh!

NGA-TRAN Feb 10, 2026

Uh oh!

LiaCastaneda Feb 10, 2026 •

edited

Loading

Uh oh!

gene-bordegaray Feb 10, 2026

Uh oh!

gene-bordegaray Feb 11, 2026

Uh oh!

Uh oh!

Uh oh!

NGA-TRAN Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Gene.bordegaray/2026/02/dyn filter partition indexed #20246

Are you sure you want to change the base?

Gene.bordegaray/2026/02/dyn filter partition indexed #20246

Conversation

gene-bordegaray commented Feb 9, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

NGA-TRAN left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LiaCastaneda Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LiaCastaneda Feb 10, 2026 •

edited

Loading