Add stubtest to CI for type annotation validation by kkollsga · Pull Request #11124 · pydata/xarray

kkollsga · 2026-02-02T07:57:59Z

Towards more sophisticated type checking with numpy #11086
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst (N/A - no new public API)

Summary

This PR integrates mypy's stubtest tool into xarray's CI pipeline to validate that type annotations match runtime behavior.

What's included

CI Integration:

New stubtest job in .github/workflows/ci-additional.yaml
Runs on every PR with continue-on-error: true (Phase 1: informational)
Tests core modules: dataarray, dataset, variable

Infrastructure:

_stubtest/allowlist.txt - Comprehensive allowlist for TYPE_CHECKING imports and intentional stub/runtime differences
_stubtest/run_stubtest.py - Runner script with reporting
ci_tests/ - Pytest-based tests for stubtest compliance and type regressions

Code cleanup:

Removed 14 obsolete # type: ignore comments revealed by scipy-stubs and pandas-stubs
Added 6 targeted type ignores for legitimate typing limitations
Net improvement: -8 type ignores

Phased rollout

The stubtest job is currently non-blocking (continue-on-error: true). Once maintainers are confident it's stable, it can be made required by changing to continue-on-error: false.

Verification

Stubtest errors reduced from 1,084 → 0 with the allowlist.

VeckoTheGecko · 2026-02-02T22:58:05Z

Hi @kkollsga , I'm curious about this issue so am happy to dig into this PR a bit - but before I do that I would like to know to what extent AI has been used on this PR.

It seems like a heavy PR and from a glance (on mobile) I dont see the connections to the original work done in Scipy. An AI disclosure statement would be really helpful for me to know how its been used 😊

(Note I am not a maintainer, and Xarray - AFAICT - doesn't have LLM disclosure as a contributing policy. This is a request from myself)

kkollsga · 2026-02-03T06:07:54Z

This PR was developed with assistance from Claude Code. AI usage is disclosed via the Co-authored-by: Claude <noreply@anthropic.com> trailer in the main commit, following xarray's CLAUDE.md guidelines.

The original issue references stubtest tooling from NumPy's fellowship work, with a cross-reference to #10273 (scipy-stubs compliance). The allowlist was built specifically for xarray's patterns by iteratively running stubtest on core modules. This PR also cleans up some obsolete type ignores that scipy-stubs made redundant, partially addressing #10273.

The categorized allowlist (TYPE_CHECKING imports, dynamic methods, protocol differences) is designed to be maintainable as xarray evolves.

kkollsga · 2026-02-03T12:29:09Z

Earlier iterations ran into issues with pixi lock validation when modifying pyproject.toml, and the initial allowlist (~600 lines) had overly broad patterns causing conflicts.

The fix was to keep this PR as pure infrastructure with no source changes, and rebuild the allowlist from scratch using efficient grouped regex patterns. This brought it down to 170 lines with 0 stubtest errors and 0 unused patterns.

Future work: Expand coverage to additional modules (backends, computation, indexes), then transition from continue-on-error: true to a required check once stable.

Add stubtest to CI for validating type stubs against runtime behavior. This helps catch type annotation regressions early. - Add stubtest job to ci-additional.yaml (non-blocking with continue-on-error) - Create allowlist for known acceptable differences (TYPE_CHECKING imports, etc.) - Tests core modules: dataarray, dataset, variable Refs pydata#11086 Co-Authored-By: Claude <noreply@anthropic.com>

headtr1ck

I see a lot of "weird" false positives in the allowlist.

Maybe that is a limitation of not using actual stub files but in-lining the typing info (stubtest was designed to check stub files vs implementation files).

But I still think this is a nice addition that we can improve upon. Thanks!

headtr1ck · 2026-02-20T09:32:11Z

.github/workflows/ci-additional.yaml

+
+      - name: Install type stubs
+        run: |
+          pixi run -e ${{env.PIXI_ENV}} -- python -m mypy --install-types --non-interactive xarray/ || true


That feels a bit wacky. I always wanted to add the stubs dependencies to the environments directly for better transparency.
But since the mypy job does this as well, I don't see a better way as well.
Let's keep this for the future.

headtr1ck · 2026-02-20T09:33:10Z

.github/workflows/ci-additional.yaml

+      - name: Run stubtest (core modules)
+        run: |
+          pixi run -e ${{env.PIXI_ENV}} -- python -m mypy.stubtest \
+            xarray.core.dataarray \


Any reason why not run against full xarray?
Does it produce too many errors?

The earlier version ran against more modules, but some of the source code fixes needed to pass stubtest caused mypy to fail — the two tools have opposing perspectives on the same annotations. Rather than debugging those conflicts across the whole codebase in one PR, I scoped it down to the three core modules as pure infrastructure (CI job + allowlist, no source changes) to avoid that tension entirely. Running against full xarray produces ~6,600 errors, which would need a much larger allowlist. Happy to expand coverage incrementally in follow-ups.

headtr1ck · 2026-02-20T09:37:00Z

_stubtest/allowlist.txt

@@ -0,0 +1,170 @@
+# Stubtest Allowlist for xarray


I see a lot of stuff here that is typing only. I am actually not sure what the best way forward is here.

Thanks for the review! You're exactly right — stubtest was designed for separate .pyi stub files where type-check-only symbols simply don't appear in the stubs. With inline annotations, every TYPE_CHECKING import gets flagged because it exists in the source but not at runtime. That accounts for ~60% of the allowlist.

The earlier version of this PR tried to address some of these by modifying source code (removing obsolete type: ignore comments, adjusting signatures), but some of those fixes caused mypy to fail — the two tools pull annotations in opposite directions. stubtest wants them to match runtime, mypy wants internal consistency, and in xarray's architecture with heavy TYPE_CHECKING blocks those goals conflict. I scoped it down to pure infrastructure (zero source changes) so the tools can't interfere with each other, which is why the allowlist is larger than ideal.

That said, each allowlist entry documents an intentional decision, and the categorized regex patterns should make it straightforward to chip away at entries in follow-up PRs as the typing infrastructure evolves.

headtr1ck · 2026-02-20T11:43:03Z

Now that I think about it: does it even make sense to use stubtest if we only use inline type hints? Isn't mypy covering these things already (I have to admit that I am not familiar with stubtest)?

kkollsga · 2026-02-20T12:10:28Z

Good question — mypy and stubtest check fundamentally different properties, even with inline annotations.

mypy checks internal consistency: "are the annotations self-consistent?" It never imports the module.
stubtest checks runtime accuracy: "do the annotations match what actually exists at runtime?" It imports the module and inspects every signature.

A concrete xarray example:
to_netcdf has 4 overloads + 1 implementation on both DataArray and Dataset — that's 10 copies of essentially the same signature. When a new parameter like auto_complex gets added, all 10 need updating. If one overload gets missed, mypy won't flag it — each overload is independently valid with or without the parameter. stubtest catches this immediately because the runtime signature doesn't match the overload.

Same applies to parameter renames, positional-only/keyword-only changes, final/abstractmethod drift, and methods that get removed but whose annotations stick around. These are all things mypy is structurally blind to.
The tradeoff with inline annotations is that TYPE_CHECKING imports create noise (~60% of the allowlist), since stubtest flags types that exist for the checker but not at runtime. The allowlist handles this cleanly.

Once this lands, it's possible to incrementally expand module coverage, start resolving straightforward allowlist entries, and use the baseline to catch regressions as the codebase evolves. Plenty of room to iterate from here.

dcherian · 2026-02-20T15:27:34Z

Thanks for the review @headtr1ck ! and thanks to @kkollsga for the contribuition

headtr1ck · 2026-02-20T17:07:13Z

I'm not (yet) familiar with pixi, not sure why the tests fail on setup.

kkollsga · 2026-02-20T18:13:20Z

The stubtest job was failing because the cache setup step was out of date. When I merged the latest main into this branch, the other CI jobs had been refactored to use a new caching approach (PR #11096), but the new stubtest job wasn't updated since there was no merge conflict. I'll fix and push the update.

Co-Authored-By: Claude <noreply@anthropic.com>

github-actions bot added topic-backends Automation Github bots, testing workflows, release automation io topic-NamedArray Lightweight version of Variable labels Feb 2, 2026

kkollsga mentioned this pull request Feb 2, 2026

more sophisticated type checking with numpy #11086

Open

kkollsga force-pushed the add-stubtest-ci branch from 08ca82f to 11bab2c Compare February 2, 2026 21:12

VeckoTheGecko mentioned this pull request Feb 2, 2026

DOC: Policy on LLM usage #11126

Open

kkollsga force-pushed the add-stubtest-ci branch from 51e903b to c36e571 Compare February 3, 2026 10:04

kkollsga force-pushed the add-stubtest-ci branch from 70abc7a to 04105e0 Compare February 3, 2026 12:32

kkollsga force-pushed the add-stubtest-ci branch from 04105e0 to 293b7e2 Compare February 3, 2026 19:11

Merge branch 'main' into add-stubtest-ci

0b7df6d

headtr1ck approved these changes Feb 20, 2026

View reviewed changes

dcherian added 2 commits February 20, 2026 08:26

Merge branch 'main' into add-stubtest-ci

d25b867

fix merge

9d802ce

dcherian enabled auto-merge (squash) February 20, 2026 15:27

Fix stubtest CI cache restore step after pydata#11096 refactor

9c9a4ab

Co-Authored-By: Claude <noreply@anthropic.com>

auto-merge was automatically disabled February 20, 2026 18:13
Head branch was pushed to by a user without write access

Merge branch 'main' into add-stubtest-ci

3aab2c1

dcherian merged commit 8f6605e into pydata:main Feb 20, 2026
41 checks passed

kkollsga deleted the add-stubtest-ci branch February 20, 2026 18:58

Uh oh!

Comments

Conversation

kkollsga commented Feb 2, 2026 • edited by headtr1ck Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Phased rollout

Verification

Uh oh!

VeckoTheGecko commented Feb 2, 2026

Uh oh!

kkollsga commented Feb 3, 2026

Uh oh!

kkollsga commented Feb 3, 2026

Uh oh!

headtr1ck left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

headtr1ck Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

headtr1ck Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

kkollsga Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

headtr1ck Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

kkollsga Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

headtr1ck commented Feb 20, 2026

Uh oh!

kkollsga commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcherian commented Feb 20, 2026

Uh oh!

headtr1ck commented Feb 20, 2026

Uh oh!

kkollsga commented Feb 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kkollsga commented Feb 2, 2026 •

edited by headtr1ck

Loading

headtr1ck left a comment •

edited

Loading

kkollsga commented Feb 20, 2026 •

edited

Loading