Skip to content

Comments

Allow multiline input in batch mode on the standard input#1605

Merged
rolandwalker merged 1 commit intomainfrom
RW/allow-multiline-batch-input
Feb 23, 2026
Merged

Allow multiline input in batch mode on the standard input#1605
rolandwalker merged 1 commit intomainfrom
RW/allow-multiline-batch-input

Conversation

@rolandwalker
Copy link
Contributor

Description

Tokenize each line of input with sqlglot, and dispatch the (possibly multi-statement) query if the last token is a semicolon. If the last token is not a semicolon, accumulate the line towards the next dispatch.

We don't handle the case where the input script itself changes the delimiter.

A limit of 5000 lines is set, after which, if we can't find a line ending in semicolon, we assume that something is wrong with the input and exit.

Caused (intentionally) by #1450, but we still can improve on it.

Fixes #1593 .

Checklist

  • I added this contribution to the changelog.md file.
  • I added my name to the AUTHORS file (or it's already there).
  • To lint and format the code, I ran
    uv run ruff check && uv run ruff format && uv run mypy --install-types .

@rolandwalker rolandwalker self-assigned this Feb 21, 2026
@github-actions
Copy link

Findings

  1. Throttle regression: In dispatch_batch_statements, throttle only applies when batch_counter > 1, so the pause won’t happen between the first and second statements (it starts on the third). Previously it applied from the second statement onward. Consider batch_counter > 0 or incrementing before dispatch. mycli/main.py:2235-2237

Tests / Edge Cases

  1. Add a batch-mode test where the final statement has no trailing semicolon (e.g., select 1;\nselect 2) to confirm EOF dispatch behavior.
  2. Consider a test or explicit handling for a TokenError at EOF (unterminated quote, etc.) so malformed input doesn’t get silently dispatched; either surface a clearer error or re-tokenize and fail.

If you want, I can suggest a minimal patch for the throttle condition.

@rolandwalker rolandwalker force-pushed the RW/allow-multiline-batch-input branch from 46f530a to 4504bff Compare February 21, 2026 15:16
@rolandwalker rolandwalker added the codex request Codex review for an updated PR label Feb 21, 2026
@github-actions
Copy link

Findings

  1. Header/no-header handling regresses when a statement ends with a semicolon followed by a comment. Dispatch only triggers if the last token is ; (mycli/main.py:2261-2266), so SELECT 1; -- comment will not dispatch and will be concatenated with subsequent statements. That means later statements can be executed in the same batch with the header-on formatter (batch_counter still 0), producing extra headers and incorrect batch output. This is more than a formatting nit because batch consumers often parse the output. Consider dispatching based on the last non-comment/non-whitespace token instead of the last token, or explicitly strip trailing comments before checking for ;.

Missing Tests

  1. Add a batch-mode test where a statement ends with ; followed by a comment, e.g. select count(*) from test; -- comment\nselect * from test limit 1; to verify the second statement runs with no header (test/test_main.py).
  2. Add a batch-mode test for EOF without a trailing semicolon (multi-line statement that finishes at EOF) to lock in expected behavior and prevent regressions.

If you want, I can propose a concrete token check using sqlglot.tokens.TokenType to ignore COMMENT and WHITESPACE.

@rolandwalker rolandwalker removed the codex request Codex review for an updated PR label Feb 21, 2026
Copy link
Contributor

@scottnemes scottnemes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified the basic use case works. Seems like that max value would be a good option to add for the user to configure, defaulting to 5000 if desired since most people won't mess with it anyway.

Tokenize each line of input with sqlglot, and dispatch the (possibly
multi-statement) query if the last token is a semicolon.  If the last
token is not a semicolon, accumulate the line towards the next dispatch.

We don't handle the case where the input script itself changes the
delimiter.

A limit of 5000 lines is set, after which, if we can't find a line
ending in semicolon, we assume that something is wrong with the input
and exit.
@rolandwalker rolandwalker force-pushed the RW/allow-multiline-batch-input branch from 4504bff to 749e5d5 Compare February 23, 2026 09:46
@rolandwalker
Copy link
Contributor Author

Let's punt on the configuration option until someone asks for it, or until we have a well thought out [batch] section.

@rolandwalker rolandwalker merged commit 5a22207 into main Feb 23, 2026
8 checks passed
@rolandwalker rolandwalker deleted the RW/allow-multiline-batch-input branch February 23, 2026 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

multi_line from stdin no longer works

2 participants