This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2014年09月08日 12:22 by serhiy.storchaka, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| re_errors_regex.patch | serhiy.storchaka, 2014年11月10日 16:50 | |||
| re_errors.patch | serhiy.storchaka, 2015年02月07日 23:34 | review | ||
| re_errors_diff.txt | serhiy.storchaka, 2015年02月07日 23:37 | |||
| re_errors_2.patch | serhiy.storchaka, 2015年02月10日 10:29 | review | ||
| regex_errors.diff | serhiy.storchaka, 2015年02月18日 18:24 | |||
| re_errors_3.patch | serhiy.storchaka, 2015年02月24日 21:56 | review | ||
| regex_errors2.diff | serhiy.storchaka, 2015年02月24日 22:04 | |||
| Messages (27) | |||
|---|---|---|---|
| msg226575 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年09月08日 12:22 | |
In some cases standard re module and third-party regex modules raise exceptions with different error messages.
1. re.match(re.compile('.'), 'A', re.I)
re: Cannot process flags argument with a compiled pattern
regex: can't process flags argument with a compiled pattern
2. re.compile('(?P<foo_123')
re: unterminated name
regex: missing >
3. re.compile('(?P<foo_123>a)(?P=foo_123')
re: unterminated name
regex: missing )
4. regex.sub('(?P<a>x)', r'\g<a', 'xx')
re: unterminated group name
regex: missing >
5. re.sub('(?P<a>x)', r'\g<', 'xx')
re: unterminated group name
regex: bad group name
6. re.sub('(?P<a>x)', r'\g<a a>', 'xx')
re: bad character in group name
regex: bad group name
7. re.sub('(?P<a>x)', r'\g<-1>', 'xx')
re: negative group number
regex: bad group name
8. re.compile('(?P<foo_123>a)(?P=!)')
re: bad character in backref group name '!'
regex: bad group name
9. re.sub('(?P<a>x)', r'\g', 'xx')
re: missing group name
regex: missing <
10. re.compile('a\\')
re.sub('x', '\\', 'x')
re: bogus escape (end of line)
regex: bad escape
11. re.compile(r'1円')
re: bogus escape: '1円'
regex: unknown group
12. re.compile('[a-')
re: unexpected end of regular expression
regex: bad set
13. re.sub(b'.', 'b', b'c')
re: expected bytes, bytearray, or an object with the buffer interface, str found
regex: expected bytes instance, str found
14. re.compile(r'\w', re.UNICODE | re.ASCII)
re: ASCII and UNICODE flags are incompatible
regex: ASCII, LOCALE and UNICODE flags are mutually incompatible
15. re.compile('(abc')
re: unbalanced parenthesis
regex: missing )
16. re.compile('abc)')
re: unbalanced parenthesis
regex: trailing characters in pattern
17. re.compile(r'((.)1円+)')
re: cannot refer to open group
regex: can't refer to an open group
Looks as in one case re messages are better, and in other cases regex messages are better. In any case it would be good to unify error messages in both modules.
|
|||
| msg226576 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年09月08日 12:32 | |
18. re.compile(r'.???') re: multiple repeat regex: nothing to repeat at position 3 |
|||
| msg226599 - (view) | Author: Steven D'Aprano (steven.daprano) * (Python committer) | Date: 2014年09月08日 19:30 | |
I'm dubious about this issue. It suggests that the wording of the exceptions is part of the API of the two modules. If the idea is just to copy the best error messages from one module to the other, then I guess there is no harm. But if the idea is to guarantee to keep the two modules' messages in sync, then I think it is unnecessary and harmful. |
|||
| msg226635 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年09月09日 14:15 | |
Yes, the idea of this issue is to enhance the re module (and the regex module if Matthew will) be picking the best error messages (or writing a new one). |
|||
| msg226641 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2014年09月09日 16:08 | |
> re: Cannot process flags argument with a compiled pattern > regex: can't process flags argument with a compiled pattern Error messages usually start with a lowercase letter, and I think that all the other ones in the re module do. By the way, which is preferred, "cannot" or "can't"? The regex module always uses "can't", but re module uses "cannot" except for "TypeError: can't use a bytes pattern on a string-like object", I think. Also, you said that one of the re module's messages was better, but didn't say which! Did you mean this one? > re: expected bytes, bytearray, or an object with the buffer interface, str found > regex: expected bytes instance, str found |
|||
| msg226790 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年09月11日 17:16 | |
> By the way, which is preferred, "cannot" or "can't"? The regex module always > uses "can't", but re module uses "cannot" except for "TypeError: can't use > a bytes pattern on a string-like object", I think. It's interesting question. Grepping in CPython sources got results: Cannot 210 cannot 865 Can't 216 can't 796 Lowercase wins uppercase with score 4:1 and short and long forms are equivalent. I left the decision to English speakers. > Also, you said that one of the re module's messages was better, but didn't > say which! Did you mean this one? > > re: expected bytes, bytearray, or an object with the buffer interface, > > str found > > regex: expected bytes instance, str found Both are not good. re variant is too verbose, but it is more correct. May be 6, 7, 8, 10, 11, 16, 18 are better in re. |
|||
| msg226847 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2014年09月12日 22:07 | |
I prefer "cannot" for error messages. "Can't" is an informal version of "cannot", used in speech, dialog representing speech, and 'informal' writing. It looks wrong to me in this context. |
|||
| msg227065 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2014年09月18日 21:08 | |
How can anything that's in the stdlib be unified with something that's not in the stdlib and currently has no prospects of getting in the stdlib? |
|||
| msg227084 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年09月19日 08:03 | |
The regex module is potential candidate for replacement of the re module. |
|||
| msg227119 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2014年09月19日 20:41 | |
The key word is "potential". I do not believe that any changes should be made to the re module until such time as there is a fully approved PEP for the regex module and that work has actually started on getting it into the stdlib. Surely backward compatibility also comes into this? |
|||
| msg227130 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2014年09月19日 23:03 | |
Steven and Mark are correct that a tracker patch cannot change a 3rd party module. On the other hand, we are free to improve error messages in new versions. And we are free to borrow ideas from 3rd part modules. I changed the title accordingly.
(Back compatibility comes into play in not making message enhancements in bugfix releases even though message details are not part of the documented API. People who write code that depends on those details, and doctexts need not so depend, should expect to revise for new versions. I expect that some of our re tests would need to be changed.)
Re and regex are a bit special in that regex is the only re replacement (that I know of) and is (almost) a drop-in replacement. So some people *are*, on their own, replacing re with regex by installing regex (easy with pip) and adding 'import regex as re' at the top of their code.
Serhiy suggested either picking the best or writing a new one, I think a new one combining both would be best in many of the cases. As a user, I like "name missing terminal '>'" for #2 (is there an adjective for a name in this context?) and for #4, "group name missing terminal '>'". (Note that we usually quote literals, as in #8.) For #12, I would like a parallel construction "set expression missing terminal ']'" if that is possible. But the currently vague re message "unexpected end of regular expression" might be raised as a point where the specific information is lost and only the general version is correct.
As for #14, either UNICODE and LOCALE *are* compatible (for re) or this is buggy.
>>> import re
>>> re.compile(r'\w', re.UNICODE | re.LOCALE)
re.compile('\\w', re.LOCALE|re.UNICODE)
|
|||
| msg227132 - (view) | Author: Steven D'Aprano (steven.daprano) * (Python committer) | Date: 2014年09月19日 23:56 | |
On Fri, Sep 19, 2014 at 08:41:57PM +0000, Mark Lawrence wrote: > I do not believe that any changes should be made to the re module > until such time as there is a fully approved PEP [....] Why is this so controversial? We're not talking about functional changes to the re module, we're talking about improving error messages. Firstly, the actual wording of error messages are not part of the API and are subject to change without notice. Secondly, nobody is talking about keeping the two modules syncronised on an on-going basis. This is just to improve the re error messages using regex as inspiration. |
|||
| msg228599 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年10月05日 17:44 | |
> As for #14, either UNICODE and LOCALE *are* compatible (for re) or this is buggy. This is buggy (issue22407). |
|||
| msg230464 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2014年11月01日 22:13 | |
+1 on the idea. |
|||
| msg230481 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2014年11月02日 07:34 | |
+1 |
|||
| msg230965 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年11月10日 16:50 | |
Here is a patch which makes re error messages match regex. It doesn't look to me that all these changes are enhancements. |
|||
| msg231057 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2014年11月12日 00:46 | |
I already said we should either stick with what we have if better (and gave examples, including sticking with 'cannot') or possibly combine the best of both if we can improve on both. 13 should use 'bytes-like' (already changed?). There is no review button. |
|||
| msg235532 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年02月07日 23:34 | |
Here is a patch which unify and improves re error messages. Added tests for all parsing errors. Now error message always points on the start of affected component, i.e. on the start of bad escape, group name or unterminated subpattern. |
|||
| msg235534 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年02月07日 23:37 | |
re_errors_diff.txt contains differences for all tested error messages. |
|||
| msg235678 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年02月10日 10:29 | |
Updated patch addresses Ezio's comments. |
|||
| msg236188 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年02月18日 18:24 | |
Here is a patch for regex which makes some error messages be the same as in re with re_errors_2.patch. You could apply it to regex if new error messages look better than old error messages. Otherwise we could change re error messages to match regex, or discuss better variants. |
|||
| msg236201 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2015年02月18日 23:24 | |
Some error messages use the indefinite article: "expected a bytes-like object, %.200s found" "cannot use a bytes pattern on a string-like object" "cannot use a string pattern on a bytes-like object" but others don't: "expected string instance, %.200s found" "expected str instance, %.200s found" Messages tend to be abbreviated, so I think that it would be better to just omit the article. I don't think that the error message "bad repeat interval" is an improvement (Why is it "bad"? What is an "interval"?). I think that saying that the min is greater than the max is clearer. |
|||
| msg236257 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年02月20日 08:59 | |
> Messages tend to be abbreviated, so I think that it would be better to just > omit the article. I agree, but this is came from standard error messages which are not consistent. I opened a thread on Python-Dev. "expected a bytes-like object" and "expected str instance" are standard error messages raised in bytes.join and str.join, not in re. We could change them though. > I don't think that the error message "bad repeat interval" is an improvement > (Why is it "bad"? What is an "interval"?). I think that saying that the min > is greater than the max is clearer. Agree. I'll change this in re. What message is better in case of overflow: "the repetition number is too large" (in re) or "repeat count too big" (in regex)? |
|||
| msg236549 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年02月24日 21:56 | |
Updated patch borrows the error message about min > max from regex. |
|||
| msg236551 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年02月24日 22:04 | |
Removed changing TypeError errors and "bad repeat interval" error in updated regex patch. |
|||
| msg236954 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年03月01日 11:04 | |
Could anyone please make a review? This patch is a prerequisite of other patches. |
|||
| msg239279 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2015年03月25日 19:04 | |
New changeset 068365acbe73 by Serhiy Storchaka in branch 'default': Issue #22364: Improved some re error messages using regex for hints. https://hg.python.org/cpython/rev/068365acbe73 |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:07 | admin | set | github: 66560 |
| 2015年03月25日 19:05:44 | serhiy.storchaka | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2015年03月25日 19:04:39 | python-dev | set | nosy:
+ python-dev messages: + msg239279 |
| 2015年03月02日 08:04:39 | serhiy.storchaka | link | issue433028 dependencies |
| 2015年03月01日 11:04:50 | serhiy.storchaka | set | messages: + msg236954 |
| 2015年02月24日 22:04:55 | serhiy.storchaka | set | files:
+ regex_errors2.diff messages: + msg236551 |
| 2015年02月24日 21:56:26 | serhiy.storchaka | set | files:
+ re_errors_3.patch messages: + msg236549 |
| 2015年02月20日 08:59:21 | serhiy.storchaka | set | messages: + msg236257 |
| 2015年02月18日 23:24:47 | mrabarnett | set | messages: + msg236201 |
| 2015年02月18日 18:24:20 | serhiy.storchaka | set | files:
+ regex_errors.diff messages: + msg236188 |
| 2015年02月10日 10:29:59 | serhiy.storchaka | set | files:
+ re_errors_2.patch messages: + msg235678 |
| 2015年02月07日 23:37:38 | serhiy.storchaka | set | files:
+ re_errors_diff.txt messages: + msg235534 |
| 2015年02月07日 23:34:38 | serhiy.storchaka | set | files:
+ re_errors.patch messages: + msg235532 stage: needs patch -> patch review |
| 2014年11月12日 00:46:21 | terry.reedy | set | messages: + msg231057 |
| 2014年11月10日 16:50:24 | serhiy.storchaka | set | files:
+ re_errors_regex.patch keywords: + patch messages: + msg230965 |
| 2014年11月02日 15:07:52 | serhiy.storchaka | set | dependencies: + Add additional attributes to re.error, Other mentions of the buffer protocol |
| 2014年11月02日 07:34:55 | rhettinger | set | nosy:
+ rhettinger messages: + msg230481 |
| 2014年11月01日 22:13:22 | ezio.melotti | set | messages:
+ msg230464 stage: needs patch |
| 2014年10月05日 17:44:35 | serhiy.storchaka | set | messages:
+ msg228599 title: Unify error messages of re and regex -> Improve some re error messages using regex for hints |
| 2014年09月19日 23:56:03 | steven.daprano | set | messages:
+ msg227132 title: Improve some re error messages using regex for hints -> Unify error messages of re and regex |
| 2014年09月19日 23:03:38 | terry.reedy | set | messages:
+ msg227130 title: Unify error messages of re and regex -> Improve some re error messages using regex for hints |
| 2014年09月19日 20:41:57 | BreamoreBoy | set | messages: + msg227119 |
| 2014年09月19日 08:03:28 | serhiy.storchaka | set | messages: + msg227084 |
| 2014年09月18日 21:08:20 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages: + msg227065 |
| 2014年09月12日 22:07:15 | terry.reedy | set | nosy:
+ terry.reedy messages: + msg226847 |
| 2014年09月11日 17:16:40 | serhiy.storchaka | set | messages: + msg226790 |
| 2014年09月09日 16:08:09 | mrabarnett | set | messages: + msg226641 |
| 2014年09月09日 14:15:41 | serhiy.storchaka | set | assignee: serhiy.storchaka messages: + msg226635 |
| 2014年09月08日 19:30:18 | steven.daprano | set | nosy:
+ steven.daprano messages: + msg226599 |
| 2014年09月08日 12:32:22 | serhiy.storchaka | set | messages: + msg226576 |
| 2014年09月08日 12:22:56 | serhiy.storchaka | create | |