gh-68451: Fix unittest discovery to support Unicode module names by RoryGlenn · Pull Request #144853 · python/cpython

RoryGlenn · 2026-02-15T22:24:18Z

gh-68451: Fix unittest discovery to support Unicode module names

Summary

unittest test discovery previously used an ASCII-only regex ([_a-z]\w*\.py$) to validate module names, which rejected test files starting with non-ASCII Unicode letters (e.g., café.py, 測試.py). This PR replaces the regex with str.isidentifier(), which correctly handles all valid Python identifiers per PEP 3131.

Changes

`Lib/unittest/loader.py`

Removed VALID_MODULE_NAME regex and the unused import re
Added _valid_module_name() function that uses os.path.splitext() + str.isidentifier() to validate module filenames
Added directory name validation in _find_test_path() — directories with invalid identifier names (e.g., containing hyphens) are now properly skipped during package discovery

`Lib/test/test_unittest/test_discovery.py`

test_valid_module_name: Tests the new _valid_module_name() function with ASCII, Unicode, and invalid names
test_find_tests_with_unicode_modules: Integration test verifying Unicode-named modules are discovered alongside ASCII ones
test_find_test_path_rejects_invalid_dir_name: Tests that directories with invalid identifier names are skipped

Prior work

This issue has two stale PRs (#1338 from 2017, #13149 from 2019) that were never merged. This PR incorporates reviewer feedback from those PRs:

Uses str.isidentifier() instead of regex (per @vstinner, @ezio-melotti)
Removes Python 2 compatibility code (per @vstinner)
Validates directory names too, not just file names (per @rbtcollins)
Uses self.addCleanup(setattr, ...) pattern (per @ezio-melotti)

Testing

All 1,095 unittest tests pass.

Issue: unittest cannot load module whose name starts with Unicode #68451

Replace the ASCII-only VALID_MODULE_NAME regex with str.isidentifier() to support test modules whose names start with non-ASCII Unicode letters (e.g., café.py, 測試.py). Also add a directory name validation check so that directories with invalid identifier names (e.g., containing hyphens) are skipped during package discovery.

python-cla-bot · 2026-02-15T22:24:23Z

All commit authors signed the Contributor License Agreement.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3ac73104fa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-15T22:28:04Z

Lib/unittest/loader.py

+    # gh-68451: use str.isidentifier() to support Unicode module names,
+    # rather than a restrictive ASCII-only regex.
+    root, ext = os.path.splitext(path)
+    return ext == '.py' and root.isidentifier()


Preserve case-insensitive .py matching in module filter

This new check makes extension validation case-sensitive (ext == '.py'), so files like test_example.PY are now silently skipped even when discovery patterns match them (notably on Windows/case-insensitive filesystems where fnmatch('test*.py') can match test_example.PY). The previous regex was case-insensitive, so this change regresses discovery for existing suites that use uppercase/mixed-case .py extensions.

Useful? React with 👍 / 👎.

bedevere-app bot mentioned this pull request Feb 15, 2026

unittest cannot load module whose name starts with Unicode #68451

Open

bedevere-app bot added the awaiting review label Feb 15, 2026

chatgpt-codex-connector bot reviewed Feb 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-68451: Fix unittest discovery to support Unicode module names#144853

gh-68451: Fix unittest discovery to support Unicode module names#144853
RoryGlenn wants to merge 1 commit intopython:mainfrom
RoryGlenn:gh-68451-unittest-unicode-discovery

RoryGlenn commented Feb 15, 2026

Uh oh!

python-cla-bot bot commented Feb 15, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

RoryGlenn commented Feb 15, 2026

gh-68451: Fix unittest discovery to support Unicode module names

Summary

Changes

Lib/unittest/loader.py

Lib/test/test_unittest/test_discovery.py

Prior work

Testing

Uh oh!

python-cla-bot bot commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`Lib/unittest/loader.py`

`Lib/test/test_unittest/test_discovery.py`

python-cla-bot bot commented Feb 15, 2026 •

edited

Loading