Skip to content

RUBY-3786 Retry inside txns on overload errors#2999

Merged
comandeo-mongo merged 3 commits intomongodb:masterfrom
comandeo-mongo:3786-transaction-state-preserved
Mar 17, 2026
Merged

RUBY-3786 Retry inside txns on overload errors#2999
comandeo-mongo merged 3 commits intomongodb:masterfrom
comandeo-mongo:3786-transaction-state-preserved

Conversation

@comandeo-mongo
Copy link
Contributor

Implements DRIVERS-3411 (RUBY-3786): retry reads and writes inside transactions
on overload errors (RetryableError + SystemOverloadedError labels).

Key changes:

  • Allow in-transaction reads/writes to fall through to overload retry logic
  • Preserve startTransaction: true on retries of the first command via revert_to_starting_transaction!
  • Skip w: majority write concern upgrade on commitTransaction when all failures were overload-only

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for retrying reads/writes inside transactions when the server returns overload-related labels (RetryableError + SystemOverloadedError), and updates unified transaction spec tests to cover the new behavior.

Changes:

  • Allow overload retry logic to run for in-transaction reads/writes (instead of immediately raising).
  • Preserve startTransaction: true on retries of the first transactional write by reverting session state before retry.
  • Track “overload-only” retry sequences to avoid upgrading commitTransaction write concern to w: majority when all failures were overload-only.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
spec/spec_tests/data/transactions_unified/backpressure-retryable-writes.yml New unified spec coverage for retrying writes in txns on overload labels.
spec/spec_tests/data/transactions_unified/backpressure-retryable-reads.yml New unified spec coverage for retrying reads in txns on overload labels.
spec/spec_tests/data/transactions_unified/backpressure-retryable-commit.yml New unified spec coverage for overload retries on commitTransaction.
spec/spec_tests/data/transactions_unified/backpressure-retryable-abort.yml New unified spec coverage for overload retries on abortTransaction.
lib/mongo/session.rb Skip w: majority upgrade on commit retries when retries were overload-only; add revert_to_starting_transaction!.
lib/mongo/retryable/write_worker.rb Plumb overload-only retry flag and revert session state to preserve startTransaction: true for first-op write retries.
lib/mongo/retryable/read_worker.rb Allow overload retries in transactions for reads.
lib/mongo/operation/context.rb Add overload_only_retry? flag accessor on operation context.
Comments suppressed due to low confidence (1)

lib/mongo/retryable/read_worker.rb:217

  • Overload retries for reads inside transactions now proceed past this guard, but the overload retry path does not restore STARTING_TRANSACTION_STATE when the failing read is the first command in a transaction. Because Session#update_state! runs during message build, the session becomes TRANSACTION_IN_PROGRESS_STATE after the first attempt, and a retry may omit startTransaction: true, breaking the transaction. Capture whether the session was starting before the first attempt and call session.revert_to_starting_transaction! before performing an overload retry (similar to WriteWorker). Adding a unified test where the first operation is a find that fails with overload labels would prevent regressions.
      def modern_read_with_retry(session, server_selector, context, &block)
        server = select_server(
          cluster,
          server_selector,
          session,
          timeout: context&.remaining_timeout_sec
        )
        result = yield server
        retry_policy.record_success(is_retry: false)
        result
      rescue *retryable_exceptions, Error::OperationFailure::Family, Auth::Unauthorized, Error::PoolError => e
        e.add_notes('modern retry', 'attempt 1')
        raise e if session.in_transaction? && !retryable_overload_error?(e)

        if retryable_overload_error?(e)
          overload_read_retry(e, session, server_selector, context, server, error_count: 1, &block)
        else
          raise e if !is_retryable_exception?(e) && !e.write_retryable?
          retry_read(e, session, server_selector, context: context, failed_server: server, &block)
        end

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +122 to +136
- commandStartedEvent:
command:
abortTransaction:
$$exists: false
lsid:
$$sessionLsid: *session0
txnNumber:
$numberLong: "1"
startTransaction:
$$exists: false
autocommit: false
writeConcern:
$$exists: false
commandName: commitTransaction
databaseName: admin
Comment on lines +129 to +143
- commandStartedEvent:
command:
abortTransaction:
$$exists: false
lsid:
$$sessionLsid: *session0
txnNumber:
$numberLong: "1"
startTransaction:
$$exists: false
autocommit: false
writeConcern:
$$exists: false
commandName: commitTransaction
databaseName: admin
jamis
jamis previously approved these changes Mar 16, 2026
@comandeo-mongo comandeo-mongo requested a review from jamis March 17, 2026 14:46
@comandeo-mongo comandeo-mongo merged commit f8b7ed5 into mongodb:master Mar 17, 2026
168 of 171 checks passed
@comandeo-mongo comandeo-mongo deleted the 3786-transaction-state-preserved branch March 17, 2026 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants