Skip to content

feat: add database.dbname setting for PostgreSQL connections#1426

Open
kushalbakshi wants to merge 5 commits intodatajoint:masterfrom
kushalbakshi:master
Open

feat: add database.dbname setting for PostgreSQL connections#1426
kushalbakshi wants to merge 5 commits intodatajoint:masterfrom
kushalbakshi:master

Conversation

@kushalbakshi
Copy link
Copy Markdown
Contributor

@kushalbakshi kushalbakshi commented Apr 6, 2026

Summary

  • Add database.dbname config option (env var: DJ_DBNAME) to specify which PostgreSQL database to connect to
  • Extract _build_connect_kwargs() helper to eliminate duplicated connection parameter construction
  • Add dbname keyword argument to Connection.__init__() for programmatic use
  • Bump version to 2.2.1

Motivation

The PostgreSQL adapter's connect() method already accepts a dbname keyword argument and defaults to "postgres" when not provided. However, there was no way to pass this value through the config or Connection layer — the kwargs passed to adapter.connect() were hardcoded in Connection.connect().

This means DataJoint could only connect to a PostgreSQL database named postgres. Any PostgreSQL deployment where the primary database has a different name (e.g., organizational naming conventions, multi-tenant setups, or hosted PostgreSQL services that use a non-default database name) required monkey-patching to work around the limitation.

Changes

settings.py

  • Add dbname: str | None field to DatabaseSettings with DJ_DBNAME env var alias
  • Defaults to None (preserving existing behavior — adapter defaults to "postgres")

connection.py

  • Add dbname keyword-only argument to Connection.__init__() — explicit arg overrides config
  • Read database.dbname from config when no explicit arg is provided
  • Store dbname in conn_info dict (participates in __eq__ comparison)
  • Extract _build_connect_kwargs() to eliminate duplicated parameter construction between the primary connect path and the SSL fallback path
  • Conditionally pass dbname to adapter.connect() only when set (non-None)

version.py

  • Bump to 2.2.1

tests/unit/test_settings.py

  • 5 new tests in TestDbnameConfiguration:
    • Default is None
    • DJ_DBNAME env var
    • Config file loading
    • Dict-style access
    • Override context manager

Configuration

{
    "database": {
        "host": "my-postgres-host.example.com",
        "port": 5432,
        "user": "pipeline_user",
        "password": "...",
        "backend": "postgresql",
        "dbname": "my_database"
    }
}

Or via environment variable:

export DJ_DBNAME=my_database

Or programmatically:

conn = dj.Connection("host", "user", "pass", 5432, dbname="my_database", backend="postgresql")

When dbname is not set, behavior is unchanged — the PostgreSQL adapter defaults to "postgres".

Test plan

  • All 248 unit tests pass
  • 5 new tests cover settings, env var, config file, dict access, and context manager override

kushalbakshi and others added 3 commits April 6, 2026 10:48
Add database.dbname config option (env: DJ_DBNAME) to specify which
PostgreSQL database to connect to. Defaults to 'postgres' if not set
(existing behavior preserved).

Required where the primary database has a non-default name.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nit__, add tests

- Extract duplicated connect kwargs construction into _build_connect_kwargs()
- Add dbname as explicit keyword argument to Connection.__init__() for
  programmatic use (explicit arg overrides config value)
- Add 5 unit tests for dbname settings (default, env var, config file,
  dict access, override context manager)
- Bump version to 2.2.1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# label_prs.yaml(prep), release.yaml(bump), post_release.yaml(edit)
# manually set this version will be eventually overwritten by the above actions
__version__ = "2.2.0"
__version__ = "2.2.1"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done automatically as part of the PyPI publishing action.

Copy link
Copy Markdown
Member

@dimitri-yatsenko dimitri-yatsenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change overall — the _build_connect_kwargs() refactor is clean and the config plumbing is solid. A few observations:

database_prefix and dbname serve the same purpose

Both settings isolate groups of schemas:

  • database_prefix (existing) — namespaces schema names on MySQL, which has a flat namespace
  • dbname (this PR) — selects a different PostgreSQL database, which is a native isolation boundary

They're two backends' answers to the same question. Worth considering whether these should be unified:

  • One config setting: database_prefix (already exists)
  • MySQL behavior: prepend prefix to schema names (as today)
  • PostgreSQL behavior: use the prefix as the dbname parameter — each prefix maps to a separate PostgreSQL database

This way a single dj.config["database.database_prefix"] = "lab_a_" would isolate schemas on both backends — via naming on MySQL, via separate databases on PostgreSQL.

If they should remain separate (there are cases where you'd want both a dbname and a prefix on PostgreSQL), the docs should clarify when to use which.

Note: database_prefix is currently defined in settings but never referenced anywhere else in the codebase — it's a config slot users read manually. This PR is a good opportunity to think about the relationship between the two.

Minor items

  • __repr__ doesn't show dbname — the format string only uses user, host, port. Worth including dbname when set, so users can tell which database they're connected to.
  • MySQL silently ignores dbname — the MySQL adapter accepts **kwargs so it'll receive and discard dbname. The config description says "for PostgreSQL connections" — should it warn if set with MySQL backend?

Copy link
Copy Markdown
Member

@dimitri-yatsenko dimitri-yatsenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up from offline discussion:

Rename dbnamename

The setting should be database.name (env var DJ_DATABASE_NAME) to avoid stutter with database.database and stay consistent with the section's naming style:

dj.config["database.name"] = "my_lab"
# or
# DJ_DATABASE_NAME=my_lab

The Connection.__init__ kwarg and adapter parameter should also use database_name rather than dbname.

Deprecate database_prefix

database_prefix was a workaround for MySQL's flat schema namespace. With PostgreSQL's native database isolation, it's no longer needed going forward:

  • 2.2: non-empty database_prefix emits a deprecation warning
  • 2.3: non-empty database_prefix raises an error

This PR should introduce database.name as the forward-looking setting and add the deprecation warning for database_prefix.

Copy link
Copy Markdown
Member

@dimitri-yatsenko dimitri-yatsenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on our discussion, rename database.dbname to database.name.

Copy link
Copy Markdown
Member

@dimitri-yatsenko dimitri-yatsenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version bump should be removed from this PR

The version.py change to 2.2.1 should be reverted. The release workflow handles version bumps automatically:

  1. draft_release.yaml — creates a draft release via release-drafter
  2. post_draft_release_published.yaml — when the draft is published, it extracts the version from the release name, updates version.py, builds, publishes to PyPI, and creates a PR back to master with the bump

If PRs bump version.py themselves, it creates conflicts: multiple PRs may target different patch versions, or a different version may be chosen at release time. Leave version.py at 2.2.0 and let the release workflow own the version number.

Copy link
Copy Markdown
Member

@dimitri-yatsenko dimitri-yatsenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MySQL should warn/error when database_name is set

Currently if a user sets dj.config["database.name"] with the MySQL backend, the value gets passed as a kwarg to the MySQL adapter's connect() method, absorbed by **kwargs, and silently ignored. No error, no warning — the user thinks they're connecting to a specific database but they're not.

The PR should handle this — either:

  1. Raise a warning in Connection.__init__ if database_name is set with the MySQL backend
  2. Or have the MySQL adapter explicitly check for and reject the dbname kwarg

kushalbakshi and others added 2 commits April 7, 2026 09:30
…database_prefix

Review feedback from PR datajoint#1426:

1. Rename setting to database.name (env: DJ_DATABASE_NAME) to match
   section naming style and avoid stutter. Connection kwarg is
   database_name. Adapter still receives dbname (psycopg2's parameter).

2. Deprecate database_prefix — emit DeprecationWarning when non-empty.
   Will be removed in DataJoint 2.3. database.name is the replacement.

3. Revert version.py to 2.2.0 — release workflow owns version bumps.

4. Warn when database.name is set with MySQL backend (MySQL does not
   support database selection via this parameter).

5. Include database name in Connection.__repr__ and log message when set.
   Format: user@host:port/database_name

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants