homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: backreference to named group does not work
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: amaury.forgeotdarc, asvetlov, docs@python, ezio.melotti, georg.brandl, mrabarnett, python-dev, steve.newcomb, terry.reedy
Priority: normal Keywords:

Created on 2012年09月17日 13:33 by steve.newcomb, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
patch steve.newcomb, 2012年09月18日 18:46
patch steve.newcomb, 2012年09月18日 19:44
Messages (12)
msg170605 - (view) Author: Steve Newcomb (steve.newcomb) * Date: 2012年09月17日 13:33
The '\\g<startquote>' in the below does not work:
>>> repr( re.compile( '\\<\\!ENTITY[ \011円\012円\015円]+\\%[ \011円\012円\015円]*(?P<entityName>[A-Za-z][A-Za-z0-9\\.\\-\\_\\:]*)[ \011円\012円\015円]*(?P<startquote>[\042円\047円])(?P<entityText>.+?)\\g<startquote>[ \011円\012円\015円]*\\>', re.IGNORECASE | re.DOTALL).search( '<!ENTITY % m.mixedContent "( #PCDATA | i | b)">'))
'None'
In the following, the '\\g<startquote>' has been replaced by '\2円'. It works.
>>> repr( re.compile( '\\<\\!ENTITY[ \011円\012円\015円]+\\%[ \011円\012円\015円]*(?P<entityName>[A-Za-z][A-Za-z0-9\\.\\-\\_\\:]*)[ \011円\012円\015円]*(?P<startquote>[\042円\047円])(?P<entityText>.+?)\2円[ \011円\012円\015円]*\\>', re.IGNORECASE | re.DOTALL).search( '<!ENTITY % m.mixedContent "( #PCDATA | i | b)">'))
'<_sre.SRE_Match object at 0x7f77503d1918>'
Either this feature is broken or the re module documentation is somehow misleading me.
(Yes, I know there is an XML error in the above. That's because it's SGML.)
msg170610 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2012年09月17日 14:13
\g is meant to be used in re.sub(), in the replacement text (see the docs); in the search pattern, (?P=startquote) can be used to refer to a named group.
The docs of "(?P<name>...)" looks clear to me.
msg170630 - (view) Author: Steve Newcomb (steve.newcomb) * Date: 2012年09月18日 02:01
I have re-read the documentation on re.sub(). Even now, now that I understand that the \g<groupname> syntax applies to the repl argument only, I cannot see how the documentation can be understood that way. The paragraph in which the explanation of the \g<groupname> syntax appears does not mention the repl argument by name, and neither does the preceding paragraph. 
The paragraph before the preceding paragraph is about the pattern argument, not the repl argument, and it consists entirely of the words, "The pattern may be a string or an RE object." 
So I don't see how the explanation of the \g<groupname> syntax can be understood as applying only to the repl argument, even though you have now informed me that that's the case (which is helpful to know -- thanks!). Indeed, the paragraph that explains the \g<groupname> syntax *still* appears to me to be discussing the pattern argument. And it even mentions the <?P<name> syntax, which can only appear in a pattern, not in a repl, in the very same sentence as the \g<groupname> syntax, even though those two syntactic features appear in *different* expression languages, and no single expression language has both of them. 
So there is no clear indication that it is discussing two different expression languages. Indeed, another syntactic feature, \groupnumber, also discussed in the same paragraph, *is* found in both expression languages, so it's even more confusing to a person who knows that both <?P<groupname> and \groupnumber appear in the pattern expression language. There is nothing in the documentation that would inform a person (such as myself) that the \g<groupname> syntax is not also part of the pattern expression language, just as the other two features are.
(And why isn't \g<groupname> part of the pattern language, anyway, or at least some way to refer to a match made in a previous *named* group? It would be very convenient to be able to do that, particularly when using a dynamically-created regexp to parse strings delimited with a choice of delimiters that must match at both ends.)
In other words, this documentation could be beneficially improved.
msg170636 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2012年09月18日 07:42
> And why isn't \g<groupname> part of the pattern language, anyway, or at
> least some way to refer to a match made in a previous *named* group?
But this way exists: (?P=startquote) is what you want. To me \g is an exception, and frankly I did not know about it before this bug report.
I agree that the following sentence could be better structured:
"""
For example, if the pattern is (?P<id>[a-zA-Z_]\w*), the group can be referenced by its name in arguments to methods of match objects, such as m.group('id') or m.end('id'), and also by name in the regular expression itself (using (?P=id)) and replacement text given to .sub() (using \g<id>).
"""
It probably needs to be split into several pieces, contributions are welcome.
msg170657 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2012年09月18日 17:07
There needed to be a way of referring to named groups in the replacement template. The existing form \groupnumber clearly wouldn't work. Other regex implementations, such as Perl, do have \g and also \k (for named groups).
In my implementation I added support for \g in regex strings.
msg170662 - (view) Author: Steve Newcomb (steve.newcomb) * Date: 2012年09月18日 18:46
> But this way exists: (?P=startquote) is what you want.
I know how I missed it: I searched for "backref" in the documentation. I did not find it in the discussion of the pattern language, because that word does not appear where <?P= is discussed.
> contributions are welcome.
See attached brief patch for the documentation. It changes the example, adds a table of the three processing contexts in which named groups can be referenced, and accounts for users who, like me, may search for "backref". (I tested everything. I think it's correct.)
Thanks again for the advice, Amaury.
msg170666 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2012年09月18日 19:21
Thanks for the patch! The new formulation looks much better, but I'll let a native speaker have another check.
Some comments: I preferred the previous example "<id>" because it's not obvious what 042円047円 is. And a bullet list would be less heavyweight IMO.
(Also please use "diff -u"; without context, the patch cannot be applied automatically)
msg170670 - (view) Author: Steve Newcomb (steve.newcomb) * Date: 2012年09月18日 19:44
> I preferred the previous example "<id>" because it's not obvious what 042円047円 is. 
Yeah, but the example I wrote has an in-pattern backreference and a real reason to use one.
In the attached patch, I have changed [042円047円] to [\'\"]. That's certainly clearer for everyone who has not memorized the ASCII table in octal! (Oops.)
> And a bullet list would be less heavyweight IMO.
Well... I rejected that choice because there would be no clarifying columnar distinction between contexts and syntaxes. Personally, I think the table is clearer. It makes it easier for users to find what they need know.
>(Also please use "diff -u"; without context, the patch cannot be applied automatically)
Oops. Attached.
msg170945 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012年09月22日 01:10
I read it as a 'native speaker' and it looks fine to me. Table is clear, but I will let doc stylist decide.
msg199061 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年10月06日 10:08
New changeset bee2736296c5 by Georg Brandl in branch '2.7':
Closes #15956: improve documentation of named groups and how to reference them.
http://hg.python.org/cpython/rev/bee2736296c5 
msg199062 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年10月06日 10:08
New changeset f765a29309d1 by Georg Brandl in branch '3.3':
Closes #15956: improve documentation of named groups and how to reference them.
http://hg.python.org/cpython/rev/f765a29309d1 
msg199063 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2013年10月06日 10:10
Thanks for the patch. I made a few changes, such as explaining what the example pattern does.
History
Date User Action Args
2022年04月11日 14:57:36adminsetgithub: 60160
2013年10月06日 10:10:21georg.brandlsetnosy: + georg.brandl
messages: + msg199063
2013年10月06日 10:08:26python-devsetmessages: + msg199062
2013年10月06日 10:08:15python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg199061

resolution: fixed
stage: patch review -> resolved
2012年09月22日 01:10:51terry.reedysetnosy: + terry.reedy
messages: + msg170945
2012年09月18日 21:42:12asvetlovsetnosy: + asvetlov
2012年09月18日 21:35:14amaury.forgeotdarcsetassignee: docs@python

nosy: + docs@python
stage: resolved -> patch review
2012年09月18日 19:44:14steve.newcombsetfiles: + patch

messages: + msg170670
2012年09月18日 19:21:12amaury.forgeotdarcsetmessages: + msg170666
2012年09月18日 18:46:03steve.newcombsetfiles: + patch

messages: + msg170662
2012年09月18日 17:07:58mrabarnettsetmessages: + msg170657
2012年09月18日 07:42:46amaury.forgeotdarcsetmessages: + msg170636
2012年09月18日 02:01:14steve.newcombsetstatus: closed -> open
resolution: not a bug -> (no value)
messages: + msg170630
2012年09月17日 14:13:40amaury.forgeotdarcsetstatus: open -> closed

nosy: + amaury.forgeotdarc
messages: + msg170610

resolution: not a bug
stage: resolved
2012年09月17日 13:33:23steve.newcombcreate

AltStyle によって変換されたページ (->オリジナル) /