This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2009年01月29日 00:29 by vstinner, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| 2to3_write.patch | vstinner, 2009年01月29日 00:37 | |||
| output_encoding.patch | abbeyj, 2009年07月30日 01:29 | |||
| Messages (7) | |||
|---|---|---|---|
| msg80733 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年01月29日 00:29 | |
If Python output is redirected to a pipe, sys.stdout encoding is ASCII. So "2to3 script.py|cat" will write the patch in ASCII. If the script contains a non-ASCII character, 2to3 fails with: ... File ".../lib2to3/refactor.py", line 238, in refactor_file self.processed_file(str(tree)[:-1], filename, write=write) File ".../lib2to3/refactor.py", line 342, in processed_file self.print_output(diff_texts(old_text, new_text, filename)) File ".../main.py", line 48, in print_output print(line) UnicodeEncodeError: 'ascii' codec can't encode character '\xfb' in position 11: ordinal not in range(128) Should we consider the input file and stdout as binary files? Workaround: modify the files in place (-w option) but don't write the patch to stdout (no such option yet). A project may contain scripts in ASCII, Latin-1 and UTF-8 (eg. Python source code ;-)). |
|||
| msg80734 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2009年01月29日 00:37 | |
Example of workaround: don't write the patch if the option -w is used. I don't need the patch if I choosed to modify the files in place. |
|||
| msg91077 - (view) | Author: James Abbatiello (abbeyj) | Date: 2009年07月30日 01:29 | |
The --no-diffs option was recently added which looks like a good workaround. Here's an attempt at a solution. If sys.stdout has an encoding set then use that, just as is being done now. If there is no encoding (implying "ascii") then use the encoding of the input file. |
|||
| msg91130 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2009年07月31日 12:33 | |
I'd like to suggest that it the output could/should be encoded in UTF-8. |
|||
| msg91136 - (view) | Author: James Abbatiello (abbeyj) | Date: 2009年07月31日 18:02 | |
In what case(s) do you propose the output to be encoded in UTF-8? If output is to a terminal and that terminal is set to Latin-1 or cp437 or whatever then outputting UTF-8 in that case will only show garbage characters to the user. If output is to a file then using the encoding of the input file makes the most sense to me. Assume you have a simple program encoded in Latin-1 that prints out a string with some non-ASCII characters. The patch is printed in UTF-8 encoding and redirected to a file. The patch program has no idea what encodings are used and it will just compare the bytes in the original to the bytes in the patch file. These won't match since the encodings are different and he patch will fail. If the output is to a pipe then I'm not sure what the right thing is. It may be intended for display on the screen with something like `less` or it may not. I don't think there's a good solution for this. So following the above logic the patch attached here does the following: 1) If output is to a terminal (sys.stdout.encoding is set) then use that encoding for output 2) Otherwise if an encoding was determined for the input file, use that encoding for output 3) If all else fails, use 'ascii' encoding. If the input contained non-ASCII characters and no encoding has been determined for the input then this will cause an exception to be raised. I think this can only happen when reading the input file from stdin. Perhaps that case needs to be looked at for how to detect the encoding of stdin. |
|||
| msg91140 - (view) | Author: Matthew Barnett (mrabarnett) * (Python triager) | Date: 2009年07月31日 18:37 | |
I was thinking that if you're converting a Python 2.x script to Python 3.x using 2to3 then also encoding the new script in UTF-8 might be a good idea. |
|||
| msg96546 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2009年12月18日 02:49 | |
Fixed in r76871. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:44 | admin | set | github: 49343 |
| 2009年12月18日 02:49:52 | benjamin.peterson | set | status: open -> closed nosy: + benjamin.peterson, collinwinter messages: + msg96546 resolution: fixed |
| 2009年07月31日 18:37:14 | mrabarnett | set | messages: + msg91140 |
| 2009年07月31日 18:02:48 | abbeyj | set | messages: + msg91136 |
| 2009年07月31日 12:33:29 | mrabarnett | set | nosy:
+ mrabarnett messages: + msg91130 |
| 2009年07月30日 01:29:29 | abbeyj | set | files:
+ output_encoding.patch nosy: + abbeyj messages: + msg91077 |
| 2009年01月29日 00:37:46 | vstinner | set | files:
+ 2to3_write.patch keywords: + patch messages: + msg80734 |
| 2009年01月29日 00:29:57 | vstinner | create | |