gh-105636: Add re.Pattern.compile_template()#135992
gh-105636: Add re.Pattern.compile_template()#135992serhiy-storchaka wants to merge 5 commits intopython:mainfrom
Conversation
24d60b1 to
25653db
Compare
- `flags` are only relevant when `pattern` is a string (followup to python#119960). - Extended "beans and spam" example to demonstrate both string & re.compile flags usage, `\1` templating, and moved it close to start. - Discuss all how-we-match parameters before what-we-do-with-matches. TODO: Is important info close enough to start? - Explain callback before backslash notation because it's shorter but also to promote it. IMHO, people fear it as a "last-resort escape hatch" while it's actually *simpler* than backslashes (python#128138 is one example). TODO: Will this order make sense after python#135992 ? - Consolidated `repl` notation from two far-away paragraphs to one place. - Starting from `\1` and `\g` which are the whole purpose of dealing with backslashes! - Briefly mention `\octal` wart, 99 limit and `\g<100>` avoiding them. - Draw attention to `\\` for getting a literal backslash. - Clarify that *most* escapes are supported but `\x\u\U\N` aren't. - Move "Unknown escapes of ASCII letters" *after* listing all the known ones. - Added a note promoting raw string notation for `repl` too.
- `flags` are only relevant when `pattern` is a string (followup to python#119960). - Moved simplest "beans and spam" example from under "repl is a function" (which it wasn't!) close to start, extended to demonstrate both string & re.compile flags usage, and `\1` templating. - Discuss all how-we-match parameters before what-we-do-with-matches. TODO: Is important info close enough to start? - Explain callback before backslash notation because it's shorter but also to promote it. IMHO, people fear it as a "last-resort escape hatch" while it's actually *simpler* than backslashes (python#128138 is one example). TODO: Will this order make sense after python#135992 ? - Consolidated `repl` notation from two far-away paragraphs to one place. - Starting from `\1` and `\g` which are the whole purpose of dealing with backslashes! - Briefly mention `\octal` wart, 99 limit and `\g<100>` avoiding them. - Draw attention to `\\` for getting a literal backslash. - Clarify that *most* escapes are supported but `\x\u\U\N` aren't. - Move "Unknown escapes of ASCII letters" *after* listing all the known ones. - Added a note promoting raw string notation for `repl` too.
- `flags` are only relevant when `pattern` is a string (followup to python#119960). - Extended "beans and spam" example to demonstrate both string & re.compile flags usage, `\1` templating, and moved it close to start. - Discuss all how-we-match parameters before what-we-do-with-matches. TODO: Is important info close enough to start? - Explain callback before backslash notation because it's shorter but also to promote it. IMHO, people fear it as a "last-resort escape hatch" while it's actually *simpler* than backslashes (python#128138 is one example). TODO: Will this order make sense after python#135992 ? - Consolidated `repl` notation from two far-away paragraphs to one place. - Starting from `\1` and `\g` which are the whole purpose of dealing with backslashes! - Briefly mention `\octal` wart, 99 limit and `\g<100>` avoiding them. - Draw attention to `\\` for getting a literal backslash. - Clarify that *most* escapes are supported but `\x\u\U\N` aren't. - Move "Unknown escapes of ASCII letters" *after* listing all the known ones. - Added a note promoting raw string notation for `repl` too.
- `flags` are only relevant when `pattern` is a string (followup to python#119960). - Extended "beans and spam" example to demonstrate both string & re.compile flags usage, `\1` templating, and moved it close to start. - Discuss all how-we-match parameters before what-we-do-with-matches. TODO: Is important info close enough to start? - Explain callback before backslash notation because it's shorter but also to promote it. IMHO, people fear it as a "last-resort escape hatch" while it's actually *simpler* than backslashes (python#128138 is one example). TODO: Will this order make sense after python#135992 ? - Consolidated `repl` notation from two far-away paragraphs to one place. - Starting from `\1` and `\g` which are the whole purpose of dealing with backslashes! - Briefly mention `\octal` wart, 99 limit and `\g<100>` avoiding them. - Draw attention to `\\` for getting a literal backslash. - Clarify that *most* escapes are supported but `\x\u\U\N` aren't. - Move "Unknown escapes of ASCII letters" *after* listing all the known ones. - Added a note promoting raw string notation for `repl` too.
There was a problem hiding this comment.
I like this. It fleshes out the concept of template strings being a notation in their own right.
Q: Why is compile_template a methon on Pattern vs. being global? IIUC it allows validation that backreferences are valid. There are really two separate questions here:
- Is it a compile_template-time error to refer to groups missing in pattern? [trying re._compile_template on 3.14, YES] EDIT: ah right, that was a major motivation in #105636 to do this.
- Is it a sub()/expand()/call-time error to a use template prepared for a different RE (which has relevant groups)?
[No strong opinion, but this being stdlib, I guess you either enforce strictness for now, or have to assume somebody will come to rely on it (even if undocumented).]
Terminology: now that we have t'template strings', is introducing yet another Template confusing?
I think until now the word "template" has been in the docs but not in API yet(?)
If the functions are named sub()/subn(), perhaps Substitution is a sane candidate?
- Crazier Q: could re module work with t-strings directly?!
re.sub(r'RE (1) (?P<name>2), t'{1}st {name}d', s)or something.
Not really cause {those} are evaluated immediately, right?
| ... 'def myfunc():') | ||
| 'static PyObject*\npy_myfunc(void)\n{' | ||
|
|
||
| If *repl* is a function, it is called for every non-overlapping occurrence of |
There was a problem hiding this comment.
Consider s/is a function/is callable/.
P.S. What happens if it's both? Currently on 3.14.2:
>>> class CallMeMaybe(str):
... def __call__(self, match): return 'call back'
...
>>> re.sub('@', CallMeMaybe('a string'), 'Did @')
'Did call back'Dunno if docs should commit either way, but maybe worth a test case 🤷
There was a problem hiding this comment.
This is not directly related to this PR. Functions (callables) always were supported.
Lib/test/test_re.py
Outdated
| self.assertEqual(re.sub(p, t, b'xyzt'), b'[y-x][t-z]') | ||
| self.assertEqual(p.sub(t, b'xyzt'), b'[y-x][t-z]') | ||
|
|
||
| def test_group_refs_emplty_literals(self): |
There was a problem hiding this comment.
| def test_group_refs_emplty_literals(self): | |
| def test_group_refs_empty_literals(self): |
Lib/re/__init__.py
Outdated
| if isinstance(repl, Template): | ||
| return repr |
There was a problem hiding this comment.
Will re.Match.expand() now also accept a Template?
Before this PR afaict resulting object is neither callable NOR accepted by re.Match.expand():
# py 3.14
>>> p = re.compile('1(?P<a>.)3')
>>> t = re._compile_template(p, r'\g<a>')
>>> m = re.match(p, '123')
>>> m.expand(t)
...
TypeError: decoding to str: need a bytes-like object, _sre.SRE_Template foundand I don't see _sre_SRE_Match_expand_impl() changed, but it does call compile first so perhaps this addition makes that work?
Consider documenting and adding a test.
There was a problem hiding this comment.
Yes, it accepts. Although I was not sure whether this is a useful feature. And there was a typo here, which was not caugh by existing tests. So, I think it is better to remove this..
Modules/_sre/sre.c
Outdated
| @@ -1247,7 +1248,9 @@ pattern_subx(_sremodulestate* module_state, | |||
| if (PyCallable_Check(ptemplate)) { | |||
| /* sub/subn takes either a function or a template */ | |||
There was a problem hiding this comment.
"a template" now sounds ambiguous, making this comment less helpful. maybe
| /* sub/subn takes either a function or a template */ | |
| /* sub/subn takes either a function/Template or a string */ |
- `flags` are only relevant when `pattern` is a string (followup to python#119960). - Extended "beans and spam" example to demonstrate both string & re.compile flags usage, `\1` templating, and moved it close to start. - Discuss all how-we-match parameters before what-we-do-with-matches. TODO: Is important info close enough to start? - Explain callback before backslash notation because it's shorter but also to promote it. IMHO, people fear it as a "last-resort escape hatch" while it's actually *simpler* than backslashes (python#128138 is one example). TODO: Will this order make sense after python#135992 ? - Consolidated `repl` notation from two far-away paragraphs to one place. - Starting from `\1` and `\g` which are the whole purpose of dealing with backslashes! - Briefly mention `\octal` wart, 99 limit and `\g<100>` avoiding them. - Draw attention to `\\` for getting a literal backslash. - Clarify that *most* escapes are supported but `\x\u\U\N` aren't. - Move "Unknown escapes of ASCII letters" *after* listing all the known ones. - Added a note promoting raw string notation for `repl` too.
AFAIK, all names referred in the t-string should be defined in the current scope. |
Because On the positive side, we can make the compiled pattern argument optional. If it is omitted, we will resolve names dynamically. But this will make the code more complex and the default case slower. I am not sure there is a need for such feature.
There is also much older |
📚 Documentation preview 📚: https://cpython-previews--135992.org.readthedocs.build/