Issue 22862: os.walk fails on undecodable filenames

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/67051

classification

Title:	os.walk fails on undecodable filenames
Type:	behavior	Stage:	resolved
Components:	Library (Lib), Unicode	Versions:	Python 2.7

process

Dependencies:	Superseder:
Status:	closed	Resolution:	wont fix
Assigned To:	Nosy List:	ezio.melotti, fhoech, vstinner
Priority:	normal	Keywords:

Created on 2014年11月13日 13:16 by fhoech, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Messages (7)
msg231110 - (view)	Author: Florian Höch (fhoech) *	Date: 2014年11月13日 13:16
If 'top' is an unicode directory name, os.listdir can still return non-unicode filenames if they can't be decoded. This case is not handled in the Python 2.x standard library version of os.walk and will cause join(top, name) to fail on such filenames with an UnicodeDecodeError.
msg231111 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2014年11月13日 13:23
What is your OS?
msg231112 - (view)	Author: Florian Höch (fhoech) *	Date: 2014年11月13日 13:30
This problem only affects Linux as far as I know (in my case I'm using Fedora 21 Beta).
msg231115 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2014年11月13日 14:40
Your problem has two solutions. 1) Upgrade to Python 3 which handles correctly your use case (thanks to the PEP 383, surrogateescape error handler) 2) Only process filenames as bytes, and encode/decode manually (so you can decide how to handle undecodable filenames)
msg231117 - (view)	Author: Florian Höch (fhoech) *	Date: 2014年11月13日 14:50
1) Is not yet possible for me unfortunately, some libraries I require are not yet available for Python 3 (but in the long run, this would be my preferred solution) 2) Would necessitate too many changes in a carefully crafted, unicode-only application. I think I'll just override os.listdir and filter out filenames that are not decodable, or override os.walk and do something equivalent.
msg231118 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2014年11月13日 14:57
> 1) Is not yet possible for me unfortunately, some libraries I require are not yet available for Python 3 (but in the long run, this would be my preferred solution) I'm curious, which libraries? Oh, I forgot to say that it's not possible to fix this issue in Python 2. Backporting the PEP 383 in Python 2 requires deep changes in the Unicode machinery, starting by the UTF-8 codec. Currently, the UTF-8 encoder encodes surrogates which violates Unicode standard and makes impossible to use this codec with the surrogateescape error handler.
msg231120 - (view)	Author: Florian Höch (fhoech) *	Date: 2014年11月13日 15:16
> I'm curious, which libraries? wxPython and wexpect (wexpect I could probably port myself, so the problem is mainly with wx) > Oh, I forgot to say that it's not possible to fix this issue in Python 2. Backporting the PEP 383 in Python 2 requires deep changes in the Unicode machinery, starting by the UTF-8 codec. Ok, that's understandable of course.

History
Date	User	Action	Args
2022年04月11日 14:58:10	admin	set	github: 67051
2014年11月13日 15:16:27	fhoech	set	messages: + msg231120
2014年11月13日 15:15:54	r.david.murray	set	status: open -> closed resolution: wont fix stage: resolved
2014年11月13日 14:57:11	vstinner	set	messages: + msg231118
2014年11月13日 14:50:07	fhoech	set	messages: + msg231117
2014年11月13日 14:40:44	vstinner	set	messages: + msg231115
2014年11月13日 13:30:54	fhoech	set	messages: + msg231112
2014年11月13日 13:23:11	vstinner	set	nosy: + ezio.melotti, vstinner messages: + msg231111 components: + Unicode
2014年11月13日 13:16:20	fhoech	create

homepage