Skip to content

Conversation

@Tushar7012
Copy link
Contributor

What does this PR do?

This PR adds a new SQLLogicTest file, subquery_additional.slt, to expand test coverage for subquery evaluation in DataFusion.

The existing subquery.slt file is primarily focused on EXPLAIN plan verification (decorrelation and physical plan shapes). While that coverage is valuable, it provides limited direct validation of result correctness for many common scalar and correlated subquery patterns. This PR complements the existing tests by exercising subquery execution behavior directly.


Which issue does this PR close?

Closes #20240


What changes are included?

  • Adds subquery_additional.slt with ~59 test cases across 22 subquery categories
  • Test cases are adapted from DuckDB’s scalar subquery test suite and conservatively adjusted to align with DataFusion semantics

Covered patterns include:

  • Basic and nested scalar subqueries (uncorrelated)
  • Correlated scalar subqueries with aggregates
  • Nested correlated subqueries (2–4 levels deep)
  • EXISTS / NOT EXISTS, IN / NOT IN (correlated and uncorrelated)
  • Subqueries with DISTINCT, COALESCE, CASE WHEN, and CTEs
  • Subqueries in HAVING, BETWEEN, and general expression contexts
  • UNION / UNION ALL inside subqueries
  • NULL-handling edge cases
  • Subqueries inside window function contexts
  • Multiple scalar subqueries in SELECT

DataFusion-specific adjustments

Expected outputs were aligned with DataFusion behavior where it intentionally differs from DuckDB, including:

  • NULL sort ordering (nulls_max default — NULLs sort last in ASC)
  • Type code mappings (e.g. Int64, Float64)
  • Derived table aliasing requiring exact column count matches

No engine, planner, optimizer, or execution code has been modified. This PR is strictly a test coverage addition.


Are these changes tested?

This PR is the test. It can be run with:

cargo test --package datafusion-sqllogictest --test sqllogictests -- subquery_additional

Copilot AI review requested due to automatic review settings February 9, 2026 21:27
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Feb 9, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new SQLLogicTest file to expand DataFusion’s result-correctness coverage for scalar and correlated subqueries, complementing the existing subquery.slt which is largely focused on EXPLAIN/decorrelation plan shapes.

Changes:

  • Added subquery_additional.slt with new SQLLogicTest cases covering scalar, correlated, nested, and NULL-handling subquery patterns.
  • Included setup/cleanup tables and a broad mix of expression contexts (HAVING, BETWEEN, CASE, CTEs, UNION, etc.) to validate execution behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


# Correlated subquery used in WHERE with EXISTS
query II
SELECT * FROM test WHERE EXISTS (SELECT a FROM test ts WHERE ts.a = test.a AND b > 21) ORDER BY a
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside the EXISTS subquery, b > 21 is unqualified and therefore resolves to the inner table (ts.b), not the outer row. This happens to be equivalent here due to ts.a = test.a, but it's easy to misread as an outer reference and becomes brittle if the subquery changes. Qualify it explicitly (e.g. ts.b > 21) to make the correlation intent unambiguous.

Suggested change
SELECT * FROM test WHERE EXISTS (SELECT a FROM test ts WHERE ts.a = test.a AND b > 21) ORDER BY a
SELECT * FROM test WHERE EXISTS (SELECT a FROM test ts WHERE ts.a = test.a AND ts.b > 21) ORDER BY a

Copilot uses AI. Check for mistakes.
Comment on lines +559 to +572
# NOT IN with NULL present in subquery result
# When the subquery contains NULL, NOT IN should return NULL for non-matching rows
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment claims this test covers NOT IN behavior when the subquery result contains NULL, but the subquery SELECT i FROM integers WHERE i = 1 can never return NULL (the = predicate filters it out). As written, this is just a normal NOT IN (1) test and doesn't exercise the intended NULL edge case. Either adjust the subquery to actually include a NULL (and update expected results accordingly), or update/remove the comment so it matches what is being tested.

Suggested change
# NOT IN with NULL present in subquery result
# When the subquery contains NULL, NOT IN should return NULL for non-matching rows
# NOT IN where the subquery result does not contain NULL
# The subquery filters out NULL, so this behaves like NOT IN (1) over non-NULL rows

Copilot uses AI. Check for mistakes.
Add subquery_additional.slt with ~59 test cases across 22 categories
to complement the existing subquery.slt. Tests are adapted from
DuckDB's scalar subquery test suite to improve result-correctness
coverage for subquery patterns not exercised by the plan-focused
tests in subquery.slt.

Categories covered:
- Basic and nested scalar subqueries (uncorrelated)
- Correlated scalar subqueries with aggregates
- Nested correlated subqueries (2-4 levels deep)
- EXISTS / NOT EXISTS (correlated and uncorrelated)
- IN / NOT IN with correlated filters
- Subqueries with DISTINCT, COALESCE, CASE WHEN
- Subqueries in HAVING, BETWEEN, expression contexts
- Scalar subqueries with CTEs
- Subqueries with ORDER BY / LIMIT
- UNION / UNION ALL inside subqueries
- NULL handling edge cases
- Subqueries with window functions
- Multiple scalar subqueries in SELECT
- Correlated subqueries with GROUP BY in outer query

No engine, planner, optimizer, or execution changes.
Test-only change.

Closes apache#20240
@Tushar7012 Tushar7012 force-pushed the add-subquery-slt-tests branch from 1c8e0ee to 285a5c6 Compare February 10, 2026 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add additional subquery tests to slt tests

1 participant