-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add additional sqllogictest coverage for scalar and correlated subqueries #20248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds a new SQLLogicTest file to expand DataFusion’s result-correctness coverage for scalar and correlated subqueries, complementing the existing subquery.slt which is largely focused on EXPLAIN/decorrelation plan shapes.
Changes:
- Added
subquery_additional.sltwith new SQLLogicTest cases covering scalar, correlated, nested, and NULL-handling subquery patterns. - Included setup/cleanup tables and a broad mix of expression contexts (HAVING, BETWEEN, CASE, CTEs, UNION, etc.) to validate execution behavior.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| # Correlated subquery used in WHERE with EXISTS | ||
| query II | ||
| SELECT * FROM test WHERE EXISTS (SELECT a FROM test ts WHERE ts.a = test.a AND b > 21) ORDER BY a |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inside the EXISTS subquery, b > 21 is unqualified and therefore resolves to the inner table (ts.b), not the outer row. This happens to be equivalent here due to ts.a = test.a, but it's easy to misread as an outer reference and becomes brittle if the subquery changes. Qualify it explicitly (e.g. ts.b > 21) to make the correlation intent unambiguous.
| SELECT * FROM test WHERE EXISTS (SELECT a FROM test ts WHERE ts.a = test.a AND b > 21) ORDER BY a | |
| SELECT * FROM test WHERE EXISTS (SELECT a FROM test ts WHERE ts.a = test.a AND ts.b > 21) ORDER BY a |
| # NOT IN with NULL present in subquery result | ||
| # When the subquery contains NULL, NOT IN should return NULL for non-matching rows |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment claims this test covers NOT IN behavior when the subquery result contains NULL, but the subquery SELECT i FROM integers WHERE i = 1 can never return NULL (the = predicate filters it out). As written, this is just a normal NOT IN (1) test and doesn't exercise the intended NULL edge case. Either adjust the subquery to actually include a NULL (and update expected results accordingly), or update/remove the comment so it matches what is being tested.
| # NOT IN with NULL present in subquery result | |
| # When the subquery contains NULL, NOT IN should return NULL for non-matching rows | |
| # NOT IN where the subquery result does not contain NULL | |
| # The subquery filters out NULL, so this behaves like NOT IN (1) over non-NULL rows |
Add subquery_additional.slt with ~59 test cases across 22 categories to complement the existing subquery.slt. Tests are adapted from DuckDB's scalar subquery test suite to improve result-correctness coverage for subquery patterns not exercised by the plan-focused tests in subquery.slt. Categories covered: - Basic and nested scalar subqueries (uncorrelated) - Correlated scalar subqueries with aggregates - Nested correlated subqueries (2-4 levels deep) - EXISTS / NOT EXISTS (correlated and uncorrelated) - IN / NOT IN with correlated filters - Subqueries with DISTINCT, COALESCE, CASE WHEN - Subqueries in HAVING, BETWEEN, expression contexts - Scalar subqueries with CTEs - Subqueries with ORDER BY / LIMIT - UNION / UNION ALL inside subqueries - NULL handling edge cases - Subqueries with window functions - Multiple scalar subqueries in SELECT - Correlated subqueries with GROUP BY in outer query No engine, planner, optimizer, or execution changes. Test-only change. Closes apache#20240
1c8e0ee to
285a5c6
Compare
What does this PR do?
This PR adds a new SQLLogicTest file,
subquery_additional.slt, to expand test coverage for subquery evaluation in DataFusion.The existing
subquery.sltfile is primarily focused on EXPLAIN plan verification (decorrelation and physical plan shapes). While that coverage is valuable, it provides limited direct validation of result correctness for many common scalar and correlated subquery patterns. This PR complements the existing tests by exercising subquery execution behavior directly.Which issue does this PR close?
Closes #20240
What changes are included?
subquery_additional.sltwith ~59 test cases across 22 subquery categoriesCovered patterns include:
EXISTS/NOT EXISTS,IN/NOT IN(correlated and uncorrelated)DISTINCT,COALESCE,CASE WHEN, and CTEsHAVING,BETWEEN, and general expression contextsUNION/UNION ALLinside subqueriesSELECTDataFusion-specific adjustments
Expected outputs were aligned with DataFusion behavior where it intentionally differs from DuckDB, including:
nulls_maxdefault — NULLs sort last in ASC)Int64,Float64)No engine, planner, optimizer, or execution code has been modified. This PR is strictly a test coverage addition.
Are these changes tested?
This PR is the test. It can be run with:
cargo test --package datafusion-sqllogictest --test sqllogictests -- subquery_additional