This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008年04月03日 04:19 by jmillikin, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| py3k_raw_strings_unicode_escapes.patch | benjamin.peterson, 2008年04月05日 15:35 | |||
| py3k_raw_strings_unicode_escapes2.patch | benjamin.peterson, 2008年04月05日 16:49 | |||
| py3k_raw_strings_unicode_escapes3.patch | benjamin.peterson, 2008年04月05日 18:52 | |||
| Messages (24) | |||
|---|---|---|---|
| msg64890 - (view) | Author: John Millikin (jmillikin) | Date: 2008年04月03日 04:19 | |
According to <http://docs.python.org/dev/3.0/reference/lexical_analysis.html#id9>, raw strings with \u and \U escape sequences should have these sequences parsed as usual. However, they are currently escaped. >>> r'\u0020' '\\u0020' Expected: >>> r'\u0020' ' ' |
|||
| msg64896 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2008年04月03日 12:54 | |
You use the "ur" string mode. >>> print ur"\u0020" " " |
|||
| msg64897 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2008年04月03日 13:15 | |
No, it's about python 3.0. I confirm the problem, and propose a patch: --- Python/ast.c.original 2008年04月03日 15:12:15.548389400 +0200 +++ Python/ast.c 2008年04月03日 15:12:28.359475800 +0200 @@ -3232,7 +3232,7 @@ return NULL; } } - if (!*bytesmode && !rawmode) { + if (!*bytesmode) { return decode_unicode(s, len, rawmode, encoding); } if (*bytesmode) { |
|||
| msg64898 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2008年04月03日 13:22 | |
Thanks for noticing, Amaury, and your patch works for me. |
|||
| msg64900 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2008年04月03日 16:27 | |
Fixed in r62128. |
|||
| msg64978 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2008年04月05日 14:52 | |
Sorry, Guido said this is not allowed: http://mail.python.org/pipermail/python-3000/2008-April/012952.html. I reverted it in r62165. |
|||
| msg64982 - (view) | Author: Guido van Rossum (gvanrossum) * (Python committer) | Date: 2008年04月05日 15:26 | |
The docs still need to be updated! An entry in what's new in 3.0 should also be added. |
|||
| msg64984 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2008年04月05日 15:35 | |
How's this? |
|||
| msg64985 - (view) | Author: Guido van Rossum (gvanrossum) * (Python committer) | Date: 2008年04月05日 16:29 | |
Instead of "ignored" (which might be read ambiguously) how about "not treated specially"? You also still need to add some words to whatsnew. |
|||
| msg64986 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2008年04月05日 16:49 | |
"not treated specially" it is! |
|||
| msg64990 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2008年04月05日 17:03 | |
The segment "use different rules for interpreting backslash escape sequences." should be killed entirely, and the whole rule told here. Also, a few paragraphs later there are more references to raw strings, e.g. "When an ``'r'`` or ``'R'`` prefix is used in a string literal," which need to be fixed too. |
|||
| msg64997 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2008年04月05日 18:52 | |
I made the requested improvements and mentioned it in NEWS. Is there worth putting in the tutorial, since it mentions Unicode strings and raw strings? |
|||
| msg65009 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2008年04月05日 21:46 | |
What about the "raw-unicode-escape" codec? Can we leave it different from raw strings literals? |
|||
| msg65083 - (view) | Author: Guido van Rossum (gvanrossum) * (Python committer) | Date: 2008年04月07日 17:54 | |
To be honest, I don't know what the uses are for that codec. |
|||
| msg65085 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2008年04月07日 18:02 | |
pickle still uses it when protocol=0 (and cPickle as well, but in trunk/ only of course) |
|||
| msg65211 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2008年04月08日 20:03 | |
You can't change the codec - it's being used in other places as well, e.g. for use cases where you need to have an 8-bit encoded readable version of a Unicode object (which happens to be Latin-1 + Unicode escapes for all non-Latin-1 characters, due to Unicode being a superset of Latin-1). Adding a new codec would be fine, though I don't know how this would map raw Unicode strings with non-Latin-1 characters in them to an 8-bit string. Perhaps this is not needed at all in Py3k. |
|||
| msg65212 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2008年04月08日 20:10 | |
Isn't "unicode-escape" enough for this purpose? |
|||
| msg65223 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2008年04月08日 23:16 | |
What do you mean with "enough" ? The "raw-unicode-escape" codec is used in Python 2.x to convert literal strings of the form ur"" to Unicode objects. It's a variant of the "unicode-escape" codec. The codec is also being used in cPickle, pickle, variants of pickle, Python code generators, etc. It serves its purpose, just like "unicode-escape" and all the other codecs in Python. |
|||
| msg65225 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2008年04月08日 23:55 | |
I mean: now that raw strings cannot represent all unicode points (or more precisely, they need the file encoding to do so), is there a use case for "raw-unicode-escape" that cannot be filled by the unicode-escape codec? Note that pickle does not use "raw-unicode-escape" as is: it replaces backslashes by \u005c. This has the nice effect that pickled strings can also be decoded by "unicode-escape". That's why I propose to completely remove raw-unicode-escape, and use unicode-escape instead. |
|||
| msg65234 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2008年04月09日 10:03 | |
While that's true for cPickle, it is not for pickle. The pickle protocol itself is defined in terms of the "raw-unicode-escape" codec (see pickle.py). Besides, you cannot assume that the Python interpreter itself is the only use-case for these codecs. The "raw-unicode-escape" codec is well usable for other purposes where you need a compact way of encoding Unicode, especially if you're strings are mostly Latin-1 and only include non-UCS2 code points every now and then. That's also the reason why pickle uses it. |
|||
| msg65502 - (view) | Author: Neal Norwitz (nnorwitz) * (Python committer) | Date: 2008年04月15日 06:09 | |
What is the status of this bug? AFAICT, the code is now correct. Have the doc changes been applied? The resolution on this report should be updated too. It's currently rejected. |
|||
| msg65512 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2008年04月15日 11:51 | |
It's rejected because the OP wanted unicode escapes to be applied in unicode strings, and I haven't applied the docs because nobody has told me I should. |
|||
| msg65930 - (view) | Author: Georg Brandl (georg.brandl) * (Python committer) | Date: 2008年04月28日 19:57 | |
Please apply the patch, but rename "Unicode escapes" to "\u and \U escapes" first. |
|||
| msg65934 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2008年04月28日 21:05 | |
Fixed in r62568. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:32 | admin | set | github: 46793 |
| 2008年04月28日 21:05:55 | benjamin.peterson | set | status: open -> closed messages: + msg65934 |
| 2008年04月28日 19:57:35 | georg.brandl | set | assignee: georg.brandl -> benjamin.peterson resolution: rejected -> fixed messages: + msg65930 |
| 2008年04月15日 11:51:25 | benjamin.peterson | set | messages: + msg65512 |
| 2008年04月15日 06:09:35 | nnorwitz | set | nosy:
+ nnorwitz messages: + msg65502 |
| 2008年04月09日 10:03:25 | lemburg | set | messages: + msg65234 |
| 2008年04月08日 23:55:38 | amaury.forgeotdarc | set | messages: + msg65225 |
| 2008年04月08日 23:16:44 | lemburg | set | messages: + msg65223 |
| 2008年04月08日 20:10:29 | amaury.forgeotdarc | set | messages: + msg65212 |
| 2008年04月08日 20:03:27 | lemburg | set | messages: + msg65211 |
| 2008年04月08日 20:01:17 | lemburg | set | messages: - msg65189 |
| 2008年04月08日 16:45:37 | lemburg | set | nosy:
+ lemburg messages: + msg65189 |
| 2008年04月07日 18:02:54 | amaury.forgeotdarc | set | messages: + msg65085 |
| 2008年04月07日 17:54:10 | gvanrossum | set | messages: + msg65083 |
| 2008年04月05日 21:46:19 | amaury.forgeotdarc | set | messages: + msg65009 |
| 2008年04月05日 18:52:21 | benjamin.peterson | set | files:
+ py3k_raw_strings_unicode_escapes3.patch messages: + msg64997 |
| 2008年04月05日 17:03:01 | georg.brandl | set | messages: + msg64990 |
| 2008年04月05日 16:49:25 | benjamin.peterson | set | files:
+ py3k_raw_strings_unicode_escapes2.patch messages: + msg64986 |
| 2008年04月05日 16:29:15 | gvanrossum | set | messages: + msg64985 |
| 2008年04月05日 15:35:19 | benjamin.peterson | set | files:
+ py3k_raw_strings_unicode_escapes.patch keywords: + patch messages: + msg64984 |
| 2008年04月05日 15:26:00 | gvanrossum | set | status: closed -> open assignee: georg.brandl messages: + msg64982 components: + Documentation, - Unicode nosy: + georg.brandl, gvanrossum |
| 2008年04月05日 14:52:07 | benjamin.peterson | set | resolution: fixed -> rejected messages: + msg64978 |
| 2008年04月03日 16:27:40 | benjamin.peterson | set | status: open -> closed resolution: fixed messages: + msg64900 |
| 2008年04月03日 13:22:52 | benjamin.peterson | set | priority: critical messages: + msg64898 |
| 2008年04月03日 13:15:00 | amaury.forgeotdarc | set | status: closed -> open resolution: not a bug -> (no value) messages: + msg64897 nosy: + amaury.forgeotdarc |
| 2008年04月03日 12:54:47 | benjamin.peterson | set | status: open -> closed resolution: not a bug messages: + msg64896 nosy: + benjamin.peterson |
| 2008年04月03日 04:19:03 | jmillikin | create | |