Skip to content

gh-105636: Add re.Pattern.compile_template()#135992

Open
serhiy-storchaka wants to merge 5 commits intopython:mainfrom
serhiy-storchaka:re-compile-template3
Open

gh-105636: Add re.Pattern.compile_template()#135992
serhiy-storchaka wants to merge 5 commits intopython:mainfrom
serhiy-storchaka:re-compile-template3

Conversation

@serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Jun 26, 2025

cben added a commit to cben/cpython that referenced this pull request Feb 11, 2026
- `flags` are only relevant when `pattern` is a string (followup to python#119960).
- Extended "beans and spam" example to demonstrate both string & re.compile
  flags usage, `\1` templating, and moved it close to start.
- Discuss all how-we-match parameters before what-we-do-with-matches.
  TODO: Is important info close enough to start?

- Explain callback before backslash notation because it's shorter but also
  to promote it. IMHO, people fear it as a "last-resort escape hatch"
  while it's actually *simpler* than backslashes (python#128138 is one example).
  TODO: Will this order make sense after python#135992 ?

- Consolidated `repl` notation from two far-away paragraphs to one place.
  - Starting from `\1` and `\g` which are the whole purpose of dealing with backslashes!
  - Briefly mention `\octal` wart, 99 limit and `\g<100>` avoiding them.
  - Draw attention to `\\` for getting a literal backslash.
  - Clarify that *most* escapes are supported but `\x\u\U\N` aren't.
  - Move "Unknown escapes of ASCII letters" *after* listing all the known ones.
  - Added a note promoting raw string notation for `repl` too.
cben added a commit to cben/cpython that referenced this pull request Feb 11, 2026
- `flags` are only relevant when `pattern` is a string (followup to python#119960).
- Moved simplest "beans and spam" example from under "repl is a function"
  (which it wasn't!) close to start, extended to demonstrate
  both string & re.compile flags usage, and `\1` templating.
- Discuss all how-we-match parameters before what-we-do-with-matches.
  TODO: Is important info close enough to start?

- Explain callback before backslash notation because it's shorter but also
  to promote it. IMHO, people fear it as a "last-resort escape hatch"
  while it's actually *simpler* than backslashes (python#128138 is one example).
  TODO: Will this order make sense after python#135992 ?

- Consolidated `repl` notation from two far-away paragraphs to one place.
  - Starting from `\1` and `\g` which are the whole purpose of dealing with backslashes!
  - Briefly mention `\octal` wart, 99 limit and `\g<100>` avoiding them.
  - Draw attention to `\\` for getting a literal backslash.
  - Clarify that *most* escapes are supported but `\x\u\U\N` aren't.
  - Move "Unknown escapes of ASCII letters" *after* listing all the known ones.
  - Added a note promoting raw string notation for `repl` too.
cben added a commit to cben/cpython that referenced this pull request Feb 11, 2026
- `flags` are only relevant when `pattern` is a string (followup to python#119960).
- Extended "beans and spam" example to demonstrate both string & re.compile
  flags usage, `\1` templating, and moved it close to start.
- Discuss all how-we-match parameters before what-we-do-with-matches.
  TODO: Is important info close enough to start?

- Explain callback before backslash notation because it's shorter but also
  to promote it. IMHO, people fear it as a "last-resort escape hatch"
  while it's actually *simpler* than backslashes (python#128138 is one example).
  TODO: Will this order make sense after python#135992 ?

- Consolidated `repl` notation from two far-away paragraphs to one place.
  - Starting from `\1` and `\g` which are the whole purpose of dealing with backslashes!
  - Briefly mention `\octal` wart, 99 limit and `\g<100>` avoiding them.
  - Draw attention to `\\` for getting a literal backslash.
  - Clarify that *most* escapes are supported but `\x\u\U\N` aren't.
  - Move "Unknown escapes of ASCII letters" *after* listing all the known ones.
  - Added a note promoting raw string notation for `repl` too.
cben added a commit to cben/cpython that referenced this pull request Feb 12, 2026
- `flags` are only relevant when `pattern` is a string (followup to python#119960).
- Extended "beans and spam" example to demonstrate both string & re.compile
  flags usage, `\1` templating, and moved it close to start.
- Discuss all how-we-match parameters before what-we-do-with-matches.
  TODO: Is important info close enough to start?

- Explain callback before backslash notation because it's shorter but also
  to promote it. IMHO, people fear it as a "last-resort escape hatch"
  while it's actually *simpler* than backslashes (python#128138 is one example).
  TODO: Will this order make sense after python#135992 ?

- Consolidated `repl` notation from two far-away paragraphs to one place.
  - Starting from `\1` and `\g` which are the whole purpose of dealing with backslashes!
  - Briefly mention `\octal` wart, 99 limit and `\g<100>` avoiding them.
  - Draw attention to `\\` for getting a literal backslash.
  - Clarify that *most* escapes are supported but `\x\u\U\N` aren't.
  - Move "Unknown escapes of ASCII letters" *after* listing all the known ones.
  - Added a note promoting raw string notation for `repl` too.
Copy link

@cben cben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. It fleshes out the concept of template strings being a notation in their own right.

Q: Why is compile_template a methon on Pattern vs. being global? IIUC it allows validation that backreferences are valid. There are really two separate questions here:

  1. Is it a compile_template-time error to refer to groups missing in pattern? [trying re._compile_template on 3.14, YES] EDIT: ah right, that was a major motivation in #105636 to do this.
  2. Is it a sub()/expand()/call-time error to a use template prepared for a different RE (which has relevant groups)?
    [No strong opinion, but this being stdlib, I guess you either enforce strictness for now, or have to assume somebody will come to rely on it (even if undocumented).]

Terminology: now that we have t'template strings', is introducing yet another Template confusing?
I think until now the word "template" has been in the docs but not in API yet(?)
If the functions are named sub()/subn(), perhaps Substitution is a sane candidate?

  • Crazier Q: could re module work with t-strings directly?! re.sub(r'RE (1) (?P<name>2), t'{1}st {name}d', s) or something.
    Not really cause {those} are evaluated immediately, right?

... 'def myfunc():')
'static PyObject*\npy_myfunc(void)\n{'

If *repl* is a function, it is called for every non-overlapping occurrence of
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider s/is a function/is callable/.

P.S. What happens if it's both? Currently on 3.14.2:

>>> class CallMeMaybe(str):
...     def __call__(self, match): return 'call back'
...
>>> re.sub('@', CallMeMaybe('a string'), 'Did @')
'Did call back'

Dunno if docs should commit either way, but maybe worth a test case 🤷

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not directly related to this PR. Functions (callables) always were supported.

self.assertEqual(re.sub(p, t, b'xyzt'), b'[y-x][t-z]')
self.assertEqual(p.sub(t, b'xyzt'), b'[y-x][t-z]')

def test_group_refs_emplty_literals(self):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_group_refs_emplty_literals(self):
def test_group_refs_empty_literals(self):

Comment on lines 381 to 382
if isinstance(repl, Template):
return repr
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will re.Match.expand() now also accept a Template?
Before this PR afaict resulting object is neither callable NOR accepted by re.Match.expand():

# py 3.14
>>> p = re.compile('1(?P<a>.)3')
>>> t = re._compile_template(p, r'\g<a>')
>>> m = re.match(p, '123')
>>> m.expand(t)
...
TypeError: decoding to str: need a bytes-like object, _sre.SRE_Template found

and I don't see _sre_SRE_Match_expand_impl() changed, but it does call compile first so perhaps this addition makes that work?

Consider documenting and adding a test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it accepts. Although I was not sure whether this is a useful feature. And there was a typo here, which was not caugh by existing tests. So, I think it is better to remove this..

@@ -1247,7 +1248,9 @@ pattern_subx(_sremodulestate* module_state,
if (PyCallable_Check(ptemplate)) {
/* sub/subn takes either a function or a template */
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a template" now sounds ambiguous, making this comment less helpful. maybe

Suggested change
/* sub/subn takes either a function or a template */
/* sub/subn takes either a function/Template or a string */

cben added a commit to cben/cpython that referenced this pull request Feb 16, 2026
- `flags` are only relevant when `pattern` is a string (followup to python#119960).
- Extended "beans and spam" example to demonstrate both string & re.compile
  flags usage, `\1` templating, and moved it close to start.
- Discuss all how-we-match parameters before what-we-do-with-matches.
  TODO: Is important info close enough to start?

- Explain callback before backslash notation because it's shorter but also
  to promote it. IMHO, people fear it as a "last-resort escape hatch"
  while it's actually *simpler* than backslashes (python#128138 is one example).
  TODO: Will this order make sense after python#135992 ?

- Consolidated `repl` notation from two far-away paragraphs to one place.
  - Starting from `\1` and `\g` which are the whole purpose of dealing with backslashes!
  - Briefly mention `\octal` wart, 99 limit and `\g<100>` avoiding them.
  - Draw attention to `\\` for getting a literal backslash.
  - Clarify that *most* escapes are supported but `\x\u\U\N` aren't.
  - Move "Unknown escapes of ASCII letters" *after* listing all the known ones.
  - Added a note promoting raw string notation for `repl` too.
@serhiy-storchaka
Copy link
Member Author

  • could re module work with t-strings directly?!

AFAIK, all names referred in the t-string should be defined in the current scope.

@serhiy-storchaka
Copy link
Member Author

Why is compile_template a methon on Pattern vs. being global?

Because _parser.parse_template() requires the pattern object. And it requires it to be able to translate named group references to references by group number. Resolving them dynamically in sub() will be much slower. We can add the re global function, but it will require passing a compiled pattern as argument.

On the positive side, we can make the compiled pattern argument optional. If it is omitted, we will resolve names dynamically. But this will make the code more complex and the default case slower. I am not sure there is a need for such feature.

Terminology: now that we have t'template strings', is introducing yet another Template confusing?

There is also much older string.Template. This is a general term, it is used in many contexts. We could use "compiled replacement string" or "compiled replacement object", would it be better?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants