This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年01月05日 20:12 by Atle.Pedersen, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Messages (4) | |||
|---|---|---|---|
| msg150684 - (view) | Author: Atle Pedersen (Atle.Pedersen) | Date: 2012年01月05日 20:12 | |
I've made a short program to traverse file tree and print file names.
for root, dirs, files in os.walk(path):
for f in files:
hex = ' '.join(["%02X"%ord(x) for x in f])
print('file is',hex,f)
This fails with the following file:
file is 67 72 DCE5 6B 61 6C 6C 65 6E 2E 6A 70 67 2E 68 74 6D 6C Traceback (most recent call last):
File "/home/atle/bin/findpictures.py", line 16, in <module>
print('file is',hexa,f)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce5' in position 2: surrogates not allowed
I don't really understand the issue, but this works with Python 2, and fails using 3.1.4 (gentoo: dev-lang/python-3.1.4-r3)
Same code using Python 2.7.2 gives:
('file is', '67 72 E5 6B 61 6C 6C 65 6E 2E 6A 70 67 2E 68 74 6D 6C', 'gr\xe5kallen.jpg.html')
|
|||
| msg150685 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2012年01月05日 20:23 | |
On Python 3, os.walk() uses the surrogateescape error handler. If the filename is in e.g. iso-8859-* and the filesystem encoding is UTF-8, decoding '\xe5' will then result in '\udce5', and '\udce5' can't then be printed because it's a lone surrogate. See also http://docs.python.org/dev/library/os.html#file-names-command-line-arguments-and-environment-variables |
|||
| msg150686 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年01月05日 20:23 | |
The file tree contains a file which has an undecodable character in it. It ends up mangled as specified in PEP 383. Priting such filenames is not directly supported (since they have invalid characters in them), but you can workaround it in several ways, for example escaping all non-ASCII chars: `print(ascii(f))`. (note that opening the file will still work fine; only outputting the filename without special care will fail) Python 2 is different since it doesn't attempt to decode filenames at all, it just treats them as opaque bytes. |
|||
| msg150910 - (view) | Author: Atle Pedersen (Atle.Pedersen) | Date: 2012年01月08日 21:41 | |
Just wanted to say thanks for very fast response, and informative information. I respect your decision to close the bug as invalid. But my five cent is that it still feels like a bug, something that shouldn't happen. Especially since it's part of a very basic function, and very unpredictable for inexperienced Python programmers. I do understand your headache. I've had my share of character set issues in my time. But thanks again for the quick reply, and suggested workarounds, which will work well for me and my situation. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:25 | admin | set | github: 57926 |
| 2012年01月08日 21:41:17 | Atle.Pedersen | set | messages: + msg150910 |
| 2012年01月05日 20:23:42 | pitrou | set | nosy:
+ pitrou messages: + msg150686 |
| 2012年01月05日 20:23:12 | ezio.melotti | set | status: open -> closed resolution: not a bug messages: + msg150685 stage: resolved |
| 2012年01月05日 20:12:52 | Atle.Pedersen | create | |