Skip to content

Comments

Add stubtest to CI for type annotation validation#11124

Merged
dcherian merged 6 commits intopydata:mainfrom
kkollsga:add-stubtest-ci
Feb 20, 2026
Merged

Add stubtest to CI for type annotation validation#11124
dcherian merged 6 commits intopydata:mainfrom
kkollsga:add-stubtest-ci

Conversation

@kkollsga
Copy link
Contributor

@kkollsga kkollsga commented Feb 2, 2026

Summary

This PR integrates mypy's stubtest tool into xarray's CI pipeline to validate that type annotations match runtime behavior.

What's included

CI Integration:

  • New stubtest job in .github/workflows/ci-additional.yaml
  • Runs on every PR with continue-on-error: true (Phase 1: informational)
  • Tests core modules: dataarray, dataset, variable

Infrastructure:

  • _stubtest/allowlist.txt - Comprehensive allowlist for TYPE_CHECKING imports and intentional stub/runtime differences
  • _stubtest/run_stubtest.py - Runner script with reporting
  • ci_tests/ - Pytest-based tests for stubtest compliance and type regressions

Code cleanup:

  • Removed 14 obsolete # type: ignore comments revealed by scipy-stubs and pandas-stubs
  • Added 6 targeted type ignores for legitimate typing limitations
  • Net improvement: -8 type ignores

Phased rollout

The stubtest job is currently non-blocking (continue-on-error: true). Once maintainers are confident it's stable, it can be made required by changing to continue-on-error: false.

Verification

Stubtest errors reduced from 1,084 → 0 with the allowlist.

@github-actions github-actions bot added topic-backends Automation Github bots, testing workflows, release automation io topic-NamedArray Lightweight version of Variable labels Feb 2, 2026
@VeckoTheGecko
Copy link
Contributor

Hi @kkollsga , I'm curious about this issue so am happy to dig into this PR a bit - but before I do that I would like to know to what extent AI has been used on this PR.

It seems like a heavy PR and from a glance (on mobile) I dont see the connections to the original work done in Scipy. An AI disclosure statement would be really helpful for me to know how its been used 😊

(Note I am not a maintainer, and Xarray - AFAICT - doesn't have LLM disclosure as a contributing policy. This is a request from myself)

@kkollsga
Copy link
Contributor Author

kkollsga commented Feb 3, 2026

This PR was developed with assistance from Claude Code. AI usage is disclosed via the Co-authored-by: Claude <noreply@anthropic.com> trailer in the main commit, following xarray's CLAUDE.md guidelines.

The original issue references stubtest tooling from NumPy's fellowship work, with a cross-reference to #10273 (scipy-stubs compliance). The allowlist was built specifically for xarray's patterns by iteratively running stubtest on core modules. This PR also cleans up some obsolete type ignores that scipy-stubs made redundant, partially addressing #10273.

The categorized allowlist (TYPE_CHECKING imports, dynamic methods, protocol differences) is designed to be maintainable as xarray evolves.

@kkollsga
Copy link
Contributor Author

kkollsga commented Feb 3, 2026

Earlier iterations ran into issues with pixi lock validation when modifying pyproject.toml, and the initial allowlist (~600 lines) had overly broad patterns causing conflicts.

The fix was to keep this PR as pure infrastructure with no source changes, and rebuild the allowlist from scratch using efficient grouped regex patterns. This brought it down to 170 lines with 0 stubtest errors and 0 unused patterns.

Future work: Expand coverage to additional modules (backends, computation, indexes), then transition from continue-on-error: true to a required check once stable.

Add stubtest to CI for validating type stubs against runtime behavior.
This helps catch type annotation regressions early.

- Add stubtest job to ci-additional.yaml (non-blocking with continue-on-error)
- Create allowlist for known acceptable differences (TYPE_CHECKING imports, etc.)
- Tests core modules: dataarray, dataset, variable

Refs pydata#11086

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Collaborator

@headtr1ck headtr1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a lot of "weird" false positives in the allowlist.

Maybe that is a limitation of not using actual stub files but in-lining the typing info (stubtest was designed to check stub files vs implementation files).

But I still think this is a nice addition that we can improve upon. Thanks!


- name: Install type stubs
run: |
pixi run -e ${{env.PIXI_ENV}} -- python -m mypy --install-types --non-interactive xarray/ || true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That feels a bit wacky. I always wanted to add the stubs dependencies to the environments directly for better transparency.
But since the mypy job does this as well, I don't see a better way as well.
Let's keep this for the future.

- name: Run stubtest (core modules)
run: |
pixi run -e ${{env.PIXI_ENV}} -- python -m mypy.stubtest \
xarray.core.dataarray \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why not run against full xarray?
Does it produce too many errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The earlier version ran against more modules, but some of the source code fixes needed to pass stubtest caused mypy to fail — the two tools have opposing perspectives on the same annotations. Rather than debugging those conflicts across the whole codebase in one PR, I scoped it down to the three core modules as pure infrastructure (CI job + allowlist, no source changes) to avoid that tension entirely. Running against full xarray produces ~6,600 errors, which would need a much larger allowlist. Happy to expand coverage incrementally in follow-ups.

@@ -0,0 +1,170 @@
# Stubtest Allowlist for xarray
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a lot of stuff here that is typing only. I am actually not sure what the best way forward is here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! You're exactly right — stubtest was designed for separate .pyi stub files where type-check-only symbols simply don't appear in the stubs. With inline annotations, every TYPE_CHECKING import gets flagged because it exists in the source but not at runtime. That accounts for ~60% of the allowlist.

The earlier version of this PR tried to address some of these by modifying source code (removing obsolete type: ignore comments, adjusting signatures), but some of those fixes caused mypy to fail — the two tools pull annotations in opposite directions. stubtest wants them to match runtime, mypy wants internal consistency, and in xarray's architecture with heavy TYPE_CHECKING blocks those goals conflict. I scoped it down to pure infrastructure (zero source changes) so the tools can't interfere with each other, which is why the allowlist is larger than ideal.

That said, each allowlist entry documents an intentional decision, and the categorized regex patterns should make it straightforward to chip away at entries in follow-up PRs as the typing infrastructure evolves.

@headtr1ck
Copy link
Collaborator

Now that I think about it: does it even make sense to use stubtest if we only use inline type hints? Isn't mypy covering these things already (I have to admit that I am not familiar with stubtest)?

@kkollsga
Copy link
Contributor Author

kkollsga commented Feb 20, 2026

Good question — mypy and stubtest check fundamentally different properties, even with inline annotations.

mypy checks internal consistency: "are the annotations self-consistent?" It never imports the module.
stubtest checks runtime accuracy: "do the annotations match what actually exists at runtime?" It imports the module and inspects every signature.

A concrete xarray example:
to_netcdf has 4 overloads + 1 implementation on both DataArray and Dataset — that's 10 copies of essentially the same signature. When a new parameter like auto_complex gets added, all 10 need updating. If one overload gets missed, mypy won't flag it — each overload is independently valid with or without the parameter. stubtest catches this immediately because the runtime signature doesn't match the overload.

Same applies to parameter renames, positional-only/keyword-only changes, final/abstractmethod drift, and methods that get removed but whose annotations stick around. These are all things mypy is structurally blind to.
The tradeoff with inline annotations is that TYPE_CHECKING imports create noise (~60% of the allowlist), since stubtest flags types that exist for the checker but not at runtime. The allowlist handles this cleanly.

Once this lands, it's possible to incrementally expand module coverage, start resolving straightforward allowlist entries, and use the baseline to catch regressions as the codebase evolves. Plenty of room to iterate from here.

@dcherian
Copy link
Contributor

Thanks for the review @headtr1ck ! and thanks to @kkollsga for the contribuition

@dcherian dcherian enabled auto-merge (squash) February 20, 2026 15:27
@headtr1ck
Copy link
Collaborator

I'm not (yet) familiar with pixi, not sure why the tests fail on setup.

@kkollsga
Copy link
Contributor Author

The stubtest job was failing because the cache setup step was out of date. When I merged the latest main into this branch, the other CI jobs had been refactored to use a new caching approach (PR #11096), but the new stubtest job wasn't updated since there was no merge conflict. I'll fix and push the update.

Co-Authored-By: Claude <noreply@anthropic.com>
auto-merge was automatically disabled February 20, 2026 18:13

Head branch was pushed to by a user without write access

@dcherian dcherian merged commit 8f6605e into pydata:main Feb 20, 2026
41 checks passed
@kkollsga kkollsga deleted the add-stubtest-ci branch February 20, 2026 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Automation Github bots, testing workflows, release automation io topic-backends topic-NamedArray Lightweight version of Variable

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants