[Python-Dev] Omission in re.sub?

MRAB python at mrabarnett.plus.com
Sun Dec 11 23:36:32 CET 2011


On 11/12/2011 21:04, Guido van Rossum wrote:
> On Sun, Dec 11, 2011 at 12:47 PM, MRAB<python at mrabarnett.plus.com> wrote:
>> On 11/12/2011 20:27, Guido van Rossum wrote:
>>>>>> On Sun, Dec 11, 2011 at 12:12 PM, MRAB<python at mrabarnett.plus.com>
>>> wrote:
>>>>>>>> I've just come across an omission in re.sub which I hadn't noticed
>>>> before.
>>>>>>>> In re.sub the replacement string can contain escape sequences, for
>>>> example:
>>>>>>>>>>> repr(re.sub(r"x", r"\n", "axb"))
>>>>>>>> "'a\\nb'"
>>>>>>>> However:
>>>>>>>>>>> repr(re.sub(r"x", r"\x0A", "axb"))
>>>>>>>> "'a\\\\x0Ab'"
>>>>>>>> Yes, it doesn't recognise "\xNN".
>>>>>>>> Is there a reason for this?
>>>>>>>> The regex module does the same, but is there any objection to me
>>>> fixing it in the regex module? (I'm thinking about compatibility
>>>> with re here.)
>>>>>>>>> As long as there's a way to place a single backslash in the output
>>> this seems fine to me, though I'm not sure it's important. Of course
>>> it will likely break some test... the test will then have to be
>>> fixed.
>>>>>> I can't remember why we did this -- is there a full list of all the
>>> escapes that re.sub() interprets somewhere? I thought it was pretty
>>> limited. Maybe it's the related list of escapes that are supported
>>> in regular expressions?
>>>>> The documentation says: """That is, \n is converted to a single newline
>> character, \r is converted to a linefeed, and so forth."""
>>>> All of the other escape sequences work as expected, except for \uNNNN
>> and \UNNNNNNNN which aren't supported at all in re.
>>>> I should probably also add \N{...} to the list for completeness.
>>> I guess the current rule is that any escapes referring to characters
> by a numeric value are not supported; this probably made some kind of
> sense because 1円 etc. are backreferences. But since we're discouraging
> octal escapes anyway I think it's fine to improve over this.
>A pattern can contain them, even octal escapes (must be 3 digits).


More information about the Python-Dev mailing list

AltStyle によって変換されたページ (->オリジナル) /