Skip to content

Bump chardet from 5.2.0 to 7.1.0#1649

Closed
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/chardet-7.1.0
Closed

Bump chardet from 5.2.0 to 7.1.0#1649
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/chardet-7.1.0

Conversation

@dependabot
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Mar 12, 2026

Bumps chardet from 5.2.0 to 7.1.0.

Release notes

Sourced from chardet's releases.

chardet 7.1.0

Features

  • Added PEP 263 encoding declaration detection — # -*- coding: ... -*- and # coding=... declarations on lines 1–2 of Python source files are now recognized with confidence 0.95 (#249)
  • Added chardet.universaldetector backward-compatibility stub so that from chardet.universaldetector import UniversalDetector works with a deprecation warning (#341)

Fixes

  • Fixed false UTF-7 detection of ASCII text containing ++ or +word patterns (#332)
  • Fixed 0.5s startup cost on first detect() call — model norms are now computed during loading instead of lazily iterating 21M entries (#333)
  • Fixed undocumented encoding name changes between chardet 5.x and 7.0 — detect() now returns chardet 5.x-compatible names by default (#338)
  • Improved ISO-2022-JP family detection — recognizes ESC sequences for ISO-2022-JP-2004 (JIS X 0213) and ISO-2022-JP-EXT (JIS X 0201 Kana)
  • Fixed silent truncation of corrupt model data (iter_unpack yielded fewer tuples instead of raising)
  • Fixed incorrect date in LICENSE

Performance

  • 5.5x faster first-detect time (~0.42s → ~0.075s) by computing model norms as a side-product of load_models()
  • ~40% faster model parsing via struct.iter_unpack for bulk entry extraction (eliminates ~305K individual unpack calls)

New API parameters

  • Added compat_names parameter (default True) to detect(), detect_all(), and UniversalDetector — set to False to get raw Python codec names instead of chardet 5.x/6.x compatible display names
  • Added prefer_superset parameter (default False) — remaps legacy ISO/subset encodings to their modern Windows/CP superset equivalents (e.g., ASCII → Windows-1252, ISO-8859-1 → Windows-1252). This will default to True in the next major version (8.0).
  • Deprecated should_rename_legacy in favor of prefer_superset — a deprecation warning is emitted when used

Improvements

  • Switched internal canonical encoding names to Python codec names (e.g., "utf-8" instead of "UTF-8"), with compat_names controlling the public output format
  • Added lookup_encoding() to registry for case-insensitive resolution of arbitrary encoding name input to canonical names
  • Achieved 100% line coverage across all source modules (+31 tests)
  • Updated benchmark numbers: 98.2% encoding accuracy, 95.2% language accuracy on 2,510 test files
  • Pinned test-data cloning to chardet release version tags for reproducible builds

Full changelog: https://chardet.readthedocs.io/en/latest/changelog.html

7.0.1

Fixes

  • Fixed false UTF-7 detection of SHA-1 git hashes (#324, fixing #323) — requirements files with VCS pins (e.g., +4bafdea3...) were misdetected as UTF-7, breaking tools like tox
  • Fixed _SINGLE_LANG_MAP missing aliases for single-language encoding lookup (e.g., big5big5hkscs)
  • Fixed PyPy TypeError in UTF-7 codec handling

Improvements

  • Retrained bigram models — 24 previously failing test cases now pass
  • Updated language equivalences for mutual intelligibility (Slovak/Czech, East Slavic + Bulgarian, Malay/Indonesian, Scandinavian languages)

New Contributors

... (truncated)

Changelog

Sourced from chardet's changelog.

7.1.0 (2026-03-11)

Features:

  • Added PEP 263 encoding declaration detection — # -*- coding: ... -*- and # coding=... declarations on lines 1–2 of Python source files are now recognized with confidence 0.95 (Dan Blanchard <https://github.com/dan-blanchard>, [#249](https://github.com/chardet/chardet/issues/249) <https://github.com/chardet/chardet/issues/249>)
  • Added chardet.universaldetector backward-compatibility stub so that from chardet.universaldetector import UniversalDetector works with a deprecation warning (Dan Blanchard <https://github.com/dan-blanchard>, [#341](https://github.com/chardet/chardet/issues/341) <https://github.com/chardet/chardet/issues/341>)

Fixes:

  • Fixed false UTF-7 detection of ASCII text containing ++ or +word patterns (Dan Blanchard <https://github.com/dan-blanchard>, [#332](https://github.com/chardet/chardet/issues/332) <https://github.com/chardet/chardet/issues/332>, [#335](https://github.com/chardet/chardet/issues/335) <https://github.com/chardet/chardet/pull/335>_)
  • Fixed 0.5s startup cost on first detect() call — model norms are now computed during loading instead of lazily iterating 21M entries (Dan Blanchard <https://github.com/dan-blanchard>, [#333](https://github.com/chardet/chardet/issues/333) <https://github.com/chardet/chardet/issues/333>, [#336](https://github.com/chardet/chardet/issues/336) <https://github.com/chardet/chardet/pull/336>_)
  • Fixed undocumented encoding name changes between chardet 5.x and 7.0 — detect() now returns chardet 5.x-compatible names by default (Dan Blanchard <https://github.com/dan-blanchard>, [#338](https://github.com/chardet/chardet/issues/338) <https://github.com/chardet/chardet/pull/338>)
  • Improved ISO-2022-JP family detection — recognizes ESC sequences for ISO-2022-JP-2004 (JIS X 0213) and ISO-2022-JP-EXT (JIS X 0201 Kana) (Dan Blanchard <https://github.com/dan-blanchard>_)
  • Fixed silent truncation of corrupt model data (iter_unpack yielded fewer tuples instead of raising) (Dan Blanchard <https://github.com/dan-blanchard>_)
  • Fixed incorrect date in LICENSE (Dan Blanchard <https://github.com/dan-blanchard>_)

Performance:

  • 5.5x faster first-detect time (~0.42s → ~0.075s) by computing model norms as a side-product of load_models() (Dan Blanchard <https://github.com/dan-blanchard>_)
  • ~40% faster model parsing via struct.iter_unpack for bulk entry extraction (eliminates ~305K individual unpack calls) (Dan Blanchard <https://github.com/dan-blanchard>_)

... (truncated)

Commits
  • f170eb4 perf: add early-exit check in PEP 263 detection for non-Python data
  • 81dd662 refactor: use pathlib.Path instead of str for filesystem paths in scripts
  • bf3ea5b test: achieve 100% test coverage
  • ce5e991 fix: adjust benchmark speedup threshold for pure Python vs mypyc
  • bfc8659 docs: update thread scaling table with GIL vs free-threaded benchmarks
  • feff427 Remove plans that got thrown in other directory
  • f854da5 fix: add --threads validation and docstring updates in compare_detectors.py
  • 8029f87 fix: only include threads in timing cache keys, not memory cache keys
  • cb3c71d feat: add --threads passthrough to compare_detectors.py
  • d168ef0 feat: add --threads option to benchmark_time.py for concurrent detection
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [chardet](https://github.com/chardet/chardet) from 5.2.0 to 7.1.0.
- [Release notes](https://github.com/chardet/chardet/releases)
- [Changelog](https://github.com/chardet/chardet/blob/main/docs/changelog.rst)
- [Commits](chardet/chardet@5.2.0...7.1.0)

---
updated-dependencies:
- dependency-name: chardet
  dependency-version: 7.1.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Mar 12, 2026
@codecov
Copy link

codecov bot commented Mar 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 54.72%. Comparing base (b98e44b) to head (579fcc6).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1649      +/-   ##
==========================================
- Coverage   54.74%   54.72%   -0.02%     
==========================================
  Files         335      335              
  Lines       27400    27400              
==========================================
- Hits        15000    14995       -5     
- Misses      12400    12405       +5     
Flag Coverage Δ
functionaltests 0.00% <ø> (ø)
unittests 54.72% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dependabot @github
Copy link
Contributor Author

dependabot bot commented on behalf of github Mar 18, 2026

Superseded by #1653.

@dependabot dependabot bot closed this Mar 18, 2026
@dependabot dependabot bot deleted the dependabot/pip/chardet-7.1.0 branch March 18, 2026 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Development

Successfully merging this pull request may close these issues.

0 participants