Issue 3187: os.listdir can return byte strings

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/47437

classification

Title:	os.listdir can return byte strings
Type:	behavior	Stage:
Components:	Documentation	Versions:	Python 3.0

process

Dependencies:	Superseder:
Status:	closed	Resolution:	accepted
Assigned To:	georg.brandl	Nosy List:	HWJ, amaury.forgeotdarc, bboissin, benjamin.peterson, djc, dlitz, draghuram, georg.brandl, gvanrossum, loewis, pitrou, vstinner, zegreek
Priority:	critical	Keywords:	patch

Created on 2008年06月24日 10:28 by HWJ, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
posix_path_bytes.patch	vstinner, 2008年08月22日 15:19	Patch posixpath.join() to support bytes
io_byte_filename.patch	vstinner, 2008年08月22日 15:33	open() allows bytes filename
fnmatch_bytes.patch	vstinner, 2008年08月22日 16:53	Patch fnmatch.filter() to accept bytes filenames
glob1_bytes.patch	vstinner, 2008年08月22日 16:54	Fix glob.glob() to accept invalid directory name
listdir_encoding_warning.patch	benjamin.peterson, 2008年09月21日 23:48
warn_at_the_end.patch	benjamin.peterson, 2008年09月21日 23:48
raise_decoding_errors.patch	benjamin.peterson, 2008年09月23日 23:05
force_unicode.patch	benjamin.peterson, 2008年09月27日 00:48
getcwd_bytes.patch	vstinner, 2008年09月27日 14:50	getcwd() returns bytes if unicode conversion fails
merge_os_getcwd_getcwdu.patch	vstinner, 2008年09月29日 12:51	Remove os.getcwdu(); os.getcwd(bytes=True) returns bytes
os_getcwdb.patch	vstinner, 2008年09月29日 15:58	Fix getcwd() (use PyUnicode_Decode) and create getcwdb()->bytes
python3_bytes_filename.patch	vstinner, 2008年09月30日 01:08	Patch for an initial support of bytes filename in Python3
setfsenc.diff	loewis, 2008年09月30日 15:21
python3_bytes_filename-3.patch	vstinner, 2008年10月02日 17:10	Patch for an initial support of bytes filename in Python3 (version 3)
win32-bytes-filenames.patch	amaury.forgeotdarc, 2008年10月03日 01:45
macpath.patch	amaury.forgeotdarc, 2008年10月03日 21:21
library_os_doc.patch	vstinner, 2008年10月06日 23:34

Messages (79)
msg68674 - (view)	Author: Helmut Jarausch (HWJ)	Date: 2008年06月24日 10:28
The script below produces 1664 lines of output before it bails out with Traceback (most recent call last): File "WalkBug.py", line 5, in <module> for Dir, SubDirs, Files in os.walk('/home/jarausch') : File "/usr/local/lib/python3.0/os.py", line 278, in walk for x in walk(path, topdown, onerror, followlinks): File "/usr/local/lib/python3.0/os.py", line 268, in walk if isdir(join(top, name)): File "/usr/local/lib/python3.0/posixpath.py", line 64, in join if b.startswith('/'): TypeError: expected an object with the buffer interface ========================= file WalkBug.py: #!/usr/local/bin/python3.0 import os for Dir, SubDirs, Files in os.walk('/home/jarausch') : print("processing {0:d} files in {1}".format(len(Files),Dir))
msg68679 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2008年06月24日 11:45
Could you tell us what this 1665th line should be? Maybe the 1665th directory has something special (a filename with spaces or non-ascii chars...) Can you try with an older version of python?
msg68684 - (view)	Author: Benjamin Peterson (benjamin.peterson) * (Python committer)	Date: 2008年06月24日 13:02
It's failing because he's giving a string to bytes.startswith when it requires a byte string or such.
msg68685 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2008年06月24日 13:12
"he's giving a string"... the user simply called os.walk, which accepts strings AFAIK. We should discover what produced this bytestring. Does listdir() returns a mixed list of strings and bytes?
msg68686 - (view)	Author: Benjamin Peterson (benjamin.peterson) * (Python committer)	Date: 2008年06月24日 13:21
It seems the conversion to unicode strings (PyUnicode vs PyBytes) was not complete in os.listdir. See the attached patch.
msg68688 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2008年06月24日 13:37
The original problem seems to come from some Unix platform, but this patch only handles two cases: - on win32, when the argument is a bytestring. - on OS/2. And in both cases, the default (utf-8) conversion seems wrong. Something like cp1252 (the ANSI code page for Western Windows) would be more sensible. In the posix part of the function, there is the comment (2003年03月04日): /* fall back to the original byte string, as discussed in patch #683592 */ btw, I find the penultimate message of this other thread very pleasant, in the py3k context... I suppose the conclusions would not be the same today.
msg68689 - (view)	Author: Helmut Jarausch (HWJ)	Date: 2008年06月24日 14:09
>> Could you tell us what this 1665th line should be? >> Maybe the 1665th directory has something special (a filename with >> >> spaces or non-ascii chars...) Yes, the next directory contains a filename with an iso-latin1 but non- ascii character >> Can you try with an older version of python? No problems - runs every night here The patch (applied to SVN GMT 13:30) does NOT help.
msg70943 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2008年08月09日 17:35
Hmm, I suppose that while the filename is latin1-encoded, Py_FileSystemDefaultEncoding is "utf-8" and therefore os.listdir fails decoding the filename and falls back on returning a byte string. It was acceptable in Python 2.x but is a very annoying problem in py3k now that unicode and bytes objects can't be mixed together anymore. I'm bumping this to critical, although there is probably no clean solution.
msg70953 - (view)	Author: Benjamin Peterson (benjamin.peterson) * (Python committer)	Date: 2008年08月10日 00:17
Let's make this a release blocker for RCs.
msg71525 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2008年08月20日 09:37
See #3616 for a consequence of this.
msg71612 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年08月21日 08:20
If the filename can not be encoded correctly in the system charset, it's not really a problem. The goal is to be able to use open(), shutil.copyfile(), os.unlink(), etc. with the given filename. orig = filename from the kernel (bytes) filename = filename from listdir() (str) dest = filename to the kernel (bytes) The goal is to get orig == dest. In my program Hachoir, to workaround this problem I store the original filename (bytes) and convert it to unicode with characters replacements (eg. replace invalid byte sequence by "?"). So the bytes string is used for open(), unlink(), ... and the unicode string is displayed to stdout for the user. IMHO, the best solution is to create such class: class Filename: def __init__(self, orig): self.as_bytes = orig self.as_str = myformat(orig) def __str__(self): return self.as_str def __bytes__(self): return self.as_bytes New problems: I guess that functions operating on filenames (os.path.*) will have to support this new type (Filename class).
msg71615 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2008年08月21日 08:51
Selon STINNER Victor <report@bugs.python.org>: > IMHO, the best solution is to create such class: > > class Filename: > def __init__(self, orig): > self.as_bytes = orig > self.as_str = myformat(orig) > def __str__(self): > return self.as_str > def __bytes__(self): > return self.as_bytes I agree that logically it's the right solution. It's also the most invasive. If that class is made a subclass of str, however, existing code shouldn't break more than it currently does.
msg71624 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年08月21日 10:59
I wrote a Filename class. I tries different methods: * no parent class "class Filename: ..." -> I don't know how to make bytes(filename) works!? But it's the best option to avoid strange bugs (mix bytes/str, remember Python 2.x...) * str parent class "class Filename(str): ..." -> doesn't work because os functions uses the fake unicode filename before testing the bytes (real) filename * bytes parent class "class Filename(bytes): ..." -> that's the current implementation The idea is to encode str -> bytes (and not bytes -> str because we want to avoid problems with such conversions). So I reimplemented most bytes methods: __addr__, __raddr__, __contains__, startswith, endswith and index. index method has no start/end arguments since the behaviour would be different than a real unicode string :-/ I added an example of fixed os.listdir(): create Filename() object if we get bytes. Should we always create Filename objects? I don't think so.
msg71629 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2008年08月21日 12:55
> * bytes parent class "class Filename(bytes): ..." -> that's the > current implementation I don't think that makes sense (especially under Windows which has Unicode file APIs). os.listdir() and friends should really return str or str-like objects, not bytes-like objects with an additional __str__ method. > * str parent class "class Filename(str): ..." -> doesn't work because > os functions uses the fake unicode filename before testing the bytes > (real) filename Well, of course, if we create a filename type, then all os functions must be adapted to accept it rather than assume str. All this is highly speculative of course, and if we really follow this course (i.e. create a filename type) it should probably be postponed to 3.1: too many changes with far-reaching consequences.
msg71647 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年08月21日 14:59
Le Thursday 21 August 2008 14:55:43 Antoine Pitrou, vous avez écrit : > > * bytes parent class "class Filename(bytes): ..." -> that's the > > current implementation > > I don't think that makes sense (especially under Windows which has Unicode > file APIs). os.listdir() and friends should really return str or str-like > objects, not bytes-like objects with an additional __str__ method. In we use "class Filename(str): ...", we have to ensure that all operations takes care of the charset because the unicode version is invalid and not be used to access to the file system. Dummy example: Filename()+"/" should not return str but raise an error or create a new filename. > Well, of course, if we create a filename type, then all os functions must > be adapted to accept it rather than assume str. If Filename has no parent class but is convertible to bytes(), os functions requires no change and so we can fix it before final 3.0 ;-)
msg71648 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2008年08月21日 15:09
> If Filename has no parent class but is convertible to bytes(), os > functions requires no change and so we can fix it before final 3.0 ;-) This sounds highly optimistic. Also, I think it's wrong to introduce a string-like class with implicit conversion both to bytes and to str, while we have taken all measures to make sure that bytes/str exchangeability doesn't exist any more in py3k.
msg71655 - (view)	Author: Guido van Rossum (gvanrossum) * (Python committer)	Date: 2008年08月21日 16:17
The proper work-around is for the app to pass bytes into os.listdir(); then it will return bytes. It would be nice if open() etc. accepted bytes (as well as strings of course), at least on Unix, but not absolutely necessary -- the app could also just know the right encoding. I see two reasonable alternatives for what os.listdir() should return when the input is a string and one of the filenames can't be decoded: either omit it from the output list; or use errors='replace' in the encoding. Failing the entire os.listdir() call is not acceptable, and neither is returning a mixture of str and bytes instances.
msg71680 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年08月21日 20:55
Le Thursday 21 August 2008 18:17:47 Guido van Rossum, vous avez écrit : > The proper work-around is for the app to pass bytes into os.listdir(); > then it will return bytes. In my case, I just would like to remove a directory with shutil.rmtree(). I don't know if it contains bytes or characters filenames :-) > It would be nice if open() etc. accepted > bytes (as well as strings of course), at least on Unix, but not > absolutely necessary -- the app could also just know the right encoding. An invalid filename has no charset. It's just a "raw" byte string. So open(), unlink(), etc. have to accept byte string. Maybe not in the Python version with in low level (C version)? > I see two reasonable alternatives for what os.listdir() should return > when the input is a string and one of the filenames can't be decoded: > either omit it from the output list; It's not a good option: rmtree() will fails because the directory in not empty :-/ > or use errors='replace' in the encoding. It will also fails because filenames will be invalid (valid unicode string but non existent file names :-/). > Failing the entire os.listdir() call is not acceptable, and > neither is returning a mixture of str and bytes instances. Ok, I have another suggestion: - by default, listdir() only returns str and raise an error (TypeError?) on invalid filename - add an optional argument (a callback), eg. "fallback_encoder", to catch such errors (similar to "onerror" from shutils.rmtree()) Example of new listdir implementation (pseudo-code): charset = sys.getfilesystemcharset() dirobj = opendir(path) try: for bytesname in readdir(dirobj): try: name = str(bytesname, charset) exept UnicodeDecodeError: name = fallback_encoder(bytesname) yield name finally: closedir(dirobj) The default fallback_encoder: def fallback_encoder(name): raise Keep raw bytes string: def fallback_encoder(name): return name Create my custom filename object: class Filename: ... def fallback_encoder(name): return Filename(name) If a callback is overkill, we can just add an option, eg. "keep_invalid_filename=True", to ask listdir() to keep bytes string if the conversion to unicode fails. In any case, open(), unlink(), etc. have to accept byte string to be accept to read, copy, remove invalid filenames. In a perfect world, all filenames would be valid UTF-8 strings, but in the real world (think to Matrix :-)), we have to support such strange cases...
msg71699 - (view)	Author: Guido van Rossum (gvanrossum) * (Python committer)	Date: 2008年08月21日 23:31
So shutil should be fixed to pass a bytes value to os.listdir(). But then os.remove() should be fixed to accept bytes as well. This is the crux I believe: on Unix at least, syscall wrappers should accept bytes for filenames. And this would then have to be extended to things like the functions in os.path, and we'd need bytes versions of os.sep and os.altsep... This sounds like a good project for 3.1. I do not accept an os.listdir() that raises an error because one filename cannot be decoded. It sounds like using errors='replace' is also wrong -- so the only solution is for os.listdir() to skip files it cannot decode. While this doesn't help for rmtree(), it is better than errors='replace' for code that descends into the tree looking for files matching a pattern or other property. So I propose this as a patch for 3.0. The callback variant is too complex; you could write it yourself by using os.listdir() with a bytes argument. This also applies to proposals like passing optional encoding and errors arguments to os.listdir().
msg71700 - (view)	Author: Benjamin Peterson (benjamin.peterson) * (Python committer)	Date: 2008年08月21日 23:35
On Thu, Aug 21, 2008 at 6:31 PM, Guido van Rossum <report@bugs.python.org> wrote: > > Guido van Rossum <guido@python.org> added the comment: > > So shutil should be fixed to pass a bytes value to os.listdir(). But > then os.remove() should be fixed to accept bytes as well. This is the > crux I believe: on Unix at least, syscall wrappers should accept bytes > for filenames. And this would then have to be extended to things like > the functions in os.path, and we'd need bytes versions of os.sep and > os.altsep... This sounds like a good project for 3.1. > > I do not accept an os.listdir() that raises an error because one > filename cannot be decoded. It sounds like using errors='replace' is > also wrong -- so the only solution is for os.listdir() to skip files it > cannot decode. While this doesn't help for rmtree(), it is better than > errors='replace' for code that descends into the tree looking for files > matching a pattern or other property. So I propose this as a patch for 3.0. As much as this maybe the right idea, I don't like the idea of silently losing the contents of a directory. That's asking for difficult to discover bugs. Could Python emit a warning in this case? > > The callback variant is too complex; you could write it yourself by > using os.listdir() with a bytes argument. This also applies to > proposals like passing optional encoding and errors arguments to > os.listdir(). > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue3187> > _______________________________________ >
msg71705 - (view)	Author: Guido van Rossum (gvanrossum) * (Python committer)	Date: 2008年08月21日 23:59
>> I do not accept an os.listdir() that raises an error because one >> filename cannot be decoded. It sounds like using errors='replace' is >> also wrong -- so the only solution is for os.listdir() to skip files it >> cannot decode. While this doesn't help for rmtree(), it is better than >> errors='replace' for code that descends into the tree looking for files >> matching a pattern or other property. So I propose this as a patch for 3.0. > > As much as this maybe the right idea, I don't like the idea of > silently losing the contents of a directory. That's asking for > difficult to discover bugs. Well, the other approaches also cause difficult to discover bugs (the original bug report here was an example :-). > Could Python emit a warning in this case? This may be the best compromise yet. It would have to use the warnings module so that you could disable it.
msg71748 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年08月22日 14:21
I implemented the "invalid filename" class feature: - by default, os.listdir() raise an error (UnicodeDecodeError) on invalid filename. The previous behaviour was to return bytes object instead of str. - if invalid_filename=True: create an InvalidFilename class instance InvalidFilename is not a bytes string, it's not a str string, it's a new class. It has three attributes: - bytes: the real filename - charset: charset (type str) - str: fake filename (type str) used by __str__() method My patch also fixes os.path.join() to accept InvalidFilename: if at last one argument is an InvalidFilename, use InvalidFilename.join() (class method). os.listdir() and os.unlink() are patched to accept InvalidFilename. unlink always accept InvalidFilename whereas listdir() only produces InvalidFilename is os.listdir(path, invalid_filename=True) is used. I added an optional argument "invalid_filename" to shutil.rmtree(), default value is True. To sum up, visible changes: - os.listdir() raise an error on invalid filename instead of return a mixed list of str and bytes - shutil.rmtree() manipulate str and InvalidFilename instead of str and bytes
msg71749 - (view)	Author: Guido van Rossum (gvanrossum) * (Python committer)	Date: 2008年08月22日 14:53
I'm not interested in the InvalidFilename class; it's an API complification that might seem right for your situation but will hinder most other people. However I am interested in a patch that makes os.unlink() (and as many other functions as you can think of) accept bytes. You'll have to think what encoding to use on Windows though, since (AFAIK) the Windows filesystem APIs do use Unicode.
msg71751 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年08月22日 15:19
@gvanrossum: os.unlink() and os.lstat() already accept byte filenames (but open() doesn't). Ok, here is very small patch for posixpath.join() to accept bytes strings. This patch is enough to fix my initial problem (#3616).
msg71752 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年08月22日 15:33
My last patch (posix_join_bytes.patch) is also enough to fix the initial reported problem: error in posixpath.join() called by os.walk(). I tried os.walk() on a directory with invalid filenames and invalid directory name and it works well. So the last bug is open() which disallow opening a file with an invalid name. So here is another patch for that.
msg71756 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年08月22日 16:53
To continue in the "accept bytes filenames" way, a new patch for fnmatch.filter(). Use sys.getfilesystemencoding() to convert the bytes. The patch contains a new unit test.
msg71757 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年08月22日 16:54
Patch glob.glob() to accept directory with invalid filename (invalid in the filesystem charset): just ignore bytes => str conversion error.
msg71769 - (view)	Author: Guido van Rossum (gvanrossum) * (Python committer)	Date: 2008年08月22日 19:05
See http://codereview.appspot.com/3055 for a code review of Victor's latest patches.
msg71991 - (view)	Author: Dwayne Litzenberger (dlitz)	Date: 2008年08月26日 18:15
I think Guido already understands this, but I haven't seen it stated very clearly here: Different systems use different "things" to identify files. On Linux/ext3, all filenames are octet strings (i.e. bytes), and only the following caveats apply: - a filename/pathname cannot contain the zero-octet (b"\x00"). - a filename/pathname cannot be empty. - a filename cannot contain the slash (b"/"); In a pathname, the slash is used to separate filenames. - the filenames b"." and b".." have special meanings; They cannot be created, deleted, or renamed. All filenames that meet these criteria are valid, and calling them "invalid" amounts to plugging one's ears and shouting "LA LA LA" while imagining Unicode having pre-dated Unix. It is sometimes convenient to imagine filenames on Linux/ext3 as sequences of Unicode code points (where the encoding is specified by LC_CTYPE---it's not necessarily UTF-8), but other times (e.g. in backup tools that need to be robust in the face of mischievous users) it is an unnecessary abstraction that introduces bugs. On Windows/NTFS, the situation is entirely different: Filenames are actually sequences of Unicode code points, and if you pretend they are octet strings, Windows will happily invent phantom filenames for you that will show up in the output of os.listdir(), but that will return "File not found" if you try to open them for reading (if you open them for writing, you risk clobbering other files that happens to have the same names). To avoid bugs, it should be possible to work exclusively with filenames in the platform's native representation. It was possible in Python 2 (though you had to be very careful). Ideally, Python 3 would recognize and enforce the difference instead of trying to guess the translations; "Explicit is better than implicit" and all that.
msg72495 - (view)	Author: Baptiste Carvello (zegreek)	Date: 2008年09月04日 12:03
If, as I understand, it is the application's job to call listdir with bytes or unicode depending on the platform, it might be useful to have a function in the os module telling whether the filesystem is bytes of unicode-native. That way, the idiom for doing something to all files in a given directory would read: >>> directory="my_direcory" ... if not os.is_filesystem_unicode(): ... directory=directory.encode(sys.stdin.encoding) ... for name in os.listdir(directory): ... f=open(name) ... # do something ... f.close() and it would work on all platforms, even if one of the filenames is not in the locale's normal encoding. If this idiom is correct, it could also be useful to include it in the module documentation so that application authors know what they are supposed to do.
msg73362 - (view)	Author: Helmut Jarausch (HWJ)	Date: 2008年09月18日 07:39
Hi, is this assumed to be fixed in 3.0rc1 ? with SVN 66506 (3.0rc1+) for dirname, subdirs, files in os.walk(bytes(Top,'iso-8859-1')) : still gives an error here: for dirname, subdirs, files in os.walk(bytes(Top,'iso-8859-1')) : File "/usr/local/lib/python3.0/os.py", line 268, in walk if isdir(join(top, name)): File "/usr/local/lib/python3.0/posixpath.py", line 64, in join if b.startswith('/'): TypeError: expected an object with the buffer interface
msg73534 - (view)	Author: Benjamin Peterson (benjamin.peterson) * (Python committer)	Date: 2008年09月21日 21:44
Here's a potential patch for listdir. It emits a UnicodeWarning (or should that be a BytesWarning?) and skips the file when decoding fails. What would be the best way to test this?
msg73535 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2008年09月21日 22:05
I did not test the patch, but I have some remarks about it: - %r does not seem to be handled by PyUnicode_FromFormat; %R maybe? - In this case, PyObject_Repr(v) is not necessary - and this will avoid a reference leak. - Does the warning warn multiple times? IIRC the default behaviour is to warn once.
msg73540 - (view)	Author: Benjamin Peterson (benjamin.peterson) * (Python committer)	Date: 2008年09月21日 23:48
Here's two more patches. One is like the old one with Amaury's comments observed. The other simply notes if there were decoding problems and warns once at the end of the listdir call. Making a warning happen more than once is tricky because it requires messing with the warnings filter. This of course takes away some of the user's control which is one of the main reasons for using the Python warning system in the first place. (I almost wish we could write another listdir that returned the names it could decode and a list of those it couldn't.)
msg73678 - (view)	Author: Benjamin Peterson (benjamin.peterson) * (Python committer)	Date: 2008年09月23日 23:05
Here's another patch. It simply propagates the UnicodeDecodeErrors. I like this because it avoids silent ignoring problem, and people can get bytes if they want by passing in a bytes path.
msg73680 - (view)	Author: Guido van Rossum (gvanrossum) * (Python committer)	Date: 2008年09月24日 00:01
Hmm... much of the os.path machinery (and os.walk) probably doesn't work with bytes, and neither do fnmatch.py and glob.py, I expect. Plus io.open() refuses bytes for the filename, even though _fileio accepts them. The latter should be fixed regardless, and one of the attachments here has a fix IIRC. Gotta run, sorry.
msg73688 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年09月24日 00:41
Guido compiled my patches here: http://codereview.appspot.com/3055 My patches allows bytes for fnmatch.filter(), glob.glob1(), os.path.join() and open().
msg73909 - (view)	Author: Benjamin Peterson (benjamin.peterson) * (Python committer)	Date: 2008年09月27日 00:47
Ok. Here's another possibility. It adds another optional parameter to listdir. If False, bytes strings can be returned. Otherwise, the UnicodeDecodeError is reraised.
msg73910 - (view)	Author: Guido van Rossum (gvanrossum) * (Python committer)	Date: 2008年09月27日 01:15
On Fri, Sep 26, 2008 at 5:47 PM, Benjamin Peterson <report@bugs.python.org> wrote: > Ok. Here's another possibility. It adds another optional parameter to > listdir. If False, bytes strings can be returned. Otherwise, the > UnicodeDecodeError is reraised. I don't see the advantage over the existing rule bytes in -> bytes out...
msg73911 - (view)	Author: Benjamin Peterson (benjamin.peterson) * (Python committer)	Date: 2008年09月27日 01:23
Does that mean that the right thing to do is raise decoding errors when unicode is given and fix the path modules so they can use bytes?
msg73925 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年09月27日 14:50
getcwd() fails with "NOT FOUNT" (not found?) if the current directory filename can't be converted to unicode (str type). Here is a patch to fallback to bytes if creation of the unicode failed.
msg73926 - (view)	Author: Dwayne Litzenberger (dlitz)	Date: 2008年09月27日 15:12
On Sat, Sep 27, 2008 at 01:15:46AM +0000, Guido van Rossum wrote: > I don't see the advantage over the existing rule bytes in -> bytes out... Guido, I figure I should say something since I have some experience in this area. I wrote some automatic backup software in Python 2 earlier this year. It had to work on ext3/Linux (where filenames are natively octet-strings) and on NTFS/Win32 (where filenames are natively unicode-strings). I had to be ridiculously careful to always use unicode paths on Win32, and to always use str paths on Linux, because otherwise Python would do the conversion automatically---poorly. It was particularly bad on Win32, where if you used os.listdir() with a non-unicode path (Python 2.x str object) in a directory that contained non-ascii filenames, Windows would invent filenames that looked similar but couldn't actually be found when using open(). So, naive (Python 2) code like this would break: for filename in os.listdir("."): f = open(filename, "rb") # ... On Linux, it was bad too, since if you used unicode paths, the filenames actually opened would depend on your LANG or LC_CTYPE or LC_ALL environment variables, and those could vary from one system to another, or even from one invocation of the program to another. The simple fact of the matter is that pathnames on Linux are _not_ Unicode, and pathnames on Windows are _not_ octet strings. They're fundamentally incompatible types that can only be reconciled when you make assumptions (e.g. specifying a character encoding) that allow you to convert from one to the other. Ideally, io.open(), os.listdir(), os.path.*, etc. would accept _only_ pathnames in their native format, and it would be the job of a wrapper to provide a portable-but-less-robust interface on top of that. Perhaps the built-in functions would use the wrapper (with reasonable defaults), but the native-only interface should be there for module-writers who want robust pathname handling.
msg73992 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2008年09月28日 21:31
I'd like to propose yet another approach: make sure that conversion according to the file system encoding always succeeds. If an unconvertable byte is detected, map it into some private-use character. To reduce the chance of conflict with other people's private-use characters, we can use some of the plane 15 private-use characters, e.g. map byte 0xPQ to U+F30PQ (in two-byte Unicode mode, this would result in a surrogate pair). This would make all file names accessible to all text processing (including glob and friends); UI display would typically either report an encoding error, or arrange for some replacement glyph to be shown. There are certain variations of the approach possible, in case there is objection to a specific detail.
msg73999 - (view)	Author: Dwayne Litzenberger (dlitz)	Date: 2008年09月29日 00:54
Martin, Consider this scenario. On ext3/Linux, assume that UTF-8 is specified in the system locale. What would happen if you have two files, named b"\xf3\xb3\x83\x80\x00" and b"\xc0\x00"? Under your proposal, the first file would decode successfully as "\U000f30c0\x00", and the second file would decode unsuccessfully, so it would be mapped to "\U000f30c0\x00"---the same thing! Under your proposal, you could end up with multiple files having the same filename (from Python's perspective). Python shouldn't break if somebody deliberately created some weird filenames. Your proposal would make it impossible to write a robust remote backup tool in Python 3. Pathnames on ext3/Linux are not Unicode. Blindly pretending they're Unicode is a leaky abstraction at best, and a security hole at worst.
msg74000 - (view)	Author: Guido van Rossum (gvanrossum) * (Python committer)	Date: 2008年09月29日 02:51
You can call it a leaky abstraction all you want, but most people think of filenames as text strings most of the time, and we need to somehow support this, at least for users who agree . I agree we also need to support bytes strings (at least on Unix) in order to support backup routines, and support for bytes in -> bytes out in os.listdir() is meant for this. The open() function should also support a pure bytes filename (and almost does so -- _fileio does, but io.py doesn't yet). os.getcwd() is a weird case and will probably need to be given a flag to make it return bytes (I don't like that style of API much, but the alternative is perhaps worse -- os.getcwd_bytes()). Conclusion: I support patches that make the I/O library work with either bytes or strings. (It's OK if the bytes don't actually work on Windows, where the native type is apparently strings -- though it has a bytes API too, doesn't it?)
msg74006 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2008年09月29日 04:45
> Consider this scenario. On ext3/Linux, assume that UTF-8 is specified > in the system locale. What would happen if you have two files, named > b"\xf3\xb3\x83\x80\x00" and b"\xc0\x00"? Under your proposal, the first > file would decode successfully as "\U000f30c0\x00", and the second file > would decode unsuccessfully, so it would be mapped to > "\U000f30c0\x00"---the same thing! Correct. > Under your proposal, you could end up with multiple files having the > same filename (from Python's perspective). Python shouldn't break if > somebody deliberately created some weird filenames. I'm not so sure about that. Practicality beats purity. > Your proposal would > make it impossible to write a robust remote backup tool in Python 3. There could be an option to set the file system encoding via an API to some known safe value, such as Latin-1, or ASCII. If you set the file system encoding to Latin-1, this escaping would never happen; if you set it to ASCII, it would happen uniformly for all non-ASCII bytes. The robust backup tool would have to know to set this option on POSIX systems. > Pathnames on ext3/Linux are not Unicode. Blindly pretending they're > Unicode is a leaky abstraction at best, and a security hole at worst. I think most Linux users would disagree, and claim that file names are indeed character strings (which is synonym to "being Unicode"). It is technically true that it's possible to create file names which are not text, but that's really a bug, not a feature - Unix and POSIX were never intended to work this way. Also, in the overwhelming majority of Python applications, consistent support for practically-existing systems matters more than robustness against malicious users.
msg74007 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2008年09月29日 04:47
> I agree we also need to > support bytes strings (at least on Unix) in order to support backup > routines How about letting such applications set the file system encoding to Latin-1?
msg74008 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2008年09月29日 05:11
James Knight points out that UTF-8b can be used to give unambiguous round-tripping of characters in a UTF-8 locale. So I would like to amend my previous proposal: - for a non-UTF-8 encoding, use private-use characters for roundtripping - if the locale's charset is UTF-8, use UTF-8b as the file system encoding.
msg74027 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年09月29日 12:51
About os.getcwd(), another solution is merge_os_getcwd_getcwdu.patch: os.getcwd() always return unicode string and raise an error on unicode decode error. Wheras os.getcwd(bytes=True) always return bytes. The old function os.getcwdu() is removed since os.getcwd() already return unicode string. Note: current version of os.getcwd() uses the wrong encoding to conversion bytes to unicode: it uses PyUnicode_FromString() instead of PyUnicode_Decode(..., Py_FileSystemDefaultEncoding, "strict") (as does getcwdu()).
msg74032 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年09月29日 15:58
As Steven Bethard proposed, here is a new version of my getcwd() patch: instead of adding a keyword argument "bytes", I created a function getcwdb(): * os.getcwd() -> unicode * os.getcwdb() -> bytes In Python2 it was: * os.getcwd() -> str (bytes) * os.getcwdu() -> unicode
msg74059 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年09月30日 01:08
Patch python3_bytes_filename.patch: - open() support bytes - listdir(unicode) -> only unicode, skip invalid filenames (as asked by Guido) - remove os.getcwdu() - create os.getcwdb() -> bytes - glob.glob() support bytes - fnmatch.filter() support bytes - posixpath.join() and posixpath.split() support bytes Mixing bytes and str is invalid. Examples raising a TypeError: - posixpath.join(b'x', 'y') - fnmatch.filter([b'x', 'y'], '') - fnmatch.filter([b'x', b'y'], '') - glob.glob1('.', b'') - glob.glob1(b'.', '') TODO: - review this patch :-) - support non-ASCII bytes in fnmatch.filter() - fix other functions, eg. posixpath.isabs() and fnmatch.fnmatchcase() - fix functions written in C: grep FileSystemDefaultEncoding - make sure that mixing bytes and str is rejected
msg74080 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2008年09月30日 15:21
Here is a patch that solves the issue in a different way: it introduces sys.setfilesystemencoding. If applications invoke sys.setfilesystemencoding("iso-8859-1"), all file names can be successfully converted into a character string.
msg74083 - (view)	Author: Guido van Rossum (gvanrossum) * (Python committer)	Date: 2008年09月30日 17:04
On Tue, Sep 30, 2008 at 8:21 AM, Martin v. Löwis <report@bugs.python.org> wrote: > Martin v. Löwis <martin@v.loewis.de> added the comment: > Here is a patch that solves the issue in a different way: it introduces > sys.setfilesystemencoding. If applications invoke > sys.setfilesystemencoding("iso-8859-1"), all file names can be > successfully converted into a character string. I'm not opposed to this going in as well, but I don't think it's the right approach, as it can cause severe cases of mojibake (which you have strongly opposed in the past). It's quite orthogonal to Victor's patch IMO.
msg74101 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年09月30日 22:02
As I wrote, python3_bytes_filename.patch was just an initial support for bytes filename. So as asked by Guido, here is a new version of my patch. Changes: - for all functions, support bytes as well as bytearray - os.readlink(unicode) -> unicode and raise an error if unicode conversion fails. Note: os.readlink(bytes)->bytes was already working. - many changes in posixpath to fix all functions: add many "if isinstance(...):" and repeat sep / curdir / parent / ... in bytes - current version of test_posixpath contains a duplicate to test_splitdrive() and test_normcase() calls normcase() twice which is wrong (fixed in my patch) - i used copy/paste + conversion to bytes to test posixpath with bytes arguments - i added some checks in posixpath tests to reject mixing bytes + str - fix quoting style - factorize pattern compilation in fnmatch - fnmatch.fnmatchcase() supports bytes - fix test_unicode_file: replace getcwdu() by getcwd(), and sometimes getcwd() by getcwdb() Open issues: - pwd.getpwnam() and grp.getgrpnam() should accept bytes, and then expanduser() should use pwd with bytes. Now expanduser() supposes that an username is an ASCII string and the user directory can be converted using getfilesystemencoding() - expandvars() doesn't support non-ASCII variable value: that's new problem. os.environ key should be str or bytes? And the value: str or bytes? It str is choosen, what is the charset to convert str to bytes?
msg74173 - (view)	Author: Guido van Rossum (gvanrossum) * (Python committer)	Date: 2008年10月02日 16:43
Martin, can you check in your changes to add sys.setfilesystemencoding()? I will check in Victor's changes (with some edits). Together this means that the various suggested higher-level solutions (like returning path-like objects, or some kind of roudtripping almost-but-not-quite-utf-8 encoding) can be implemented in pure Python.
msg74186 - (view)	Author: Guido van Rossum (gvanrossum) * (Python committer)	Date: 2008年10月02日 18:57
I have committed a somewhat modified version of Victor's latest patch as r66743. Assigning to MvL now for the sys.setfilesystemencoding() commit. Then you can hand it over to the doc folks; something derived from http://wiki.python.org/moin/Python3UnicodeDecodeError should probably be added to the Doc tree.
msg74192 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2008年10月02日 19:35
> Martin, can you check in your changes to add sys.setfilesystemencoding()? Will do tomorrow.
msg74222 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2008年10月03日 01:45
Here is a patch for Windows: The failing tests on buildbots now pass (test_fnmatch test_posixpath test_unicode_file) test_ntpath also runs functions with bytes. I suppose macpath.py is broken as well.
msg74236 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年10月03日 10:38
Le Friday 03 October 2008 03:45:44 Amaury Forgeot d'Arc, vous avez écrit : > Here is a patch for Windows: (...) > test_ntpath also runs functions with bytes. Which charset is used when you use bytes filename? I read somewhere that it's the "current codepage". How can the user get this codepage in Python? I ask this to complete my document: http://wiki.python.org/moin/Python3UnicodeDecodeError Don't hesitate to edit directly the document, which may also be moved to Python3 Doc/ directory. You should also support bytearray() in ntpath: isinstance(path, (bytes, bytearray)) The unit tests might use pure unicode on Windows and bytes on Linux, especially getcwd() vs getcwdb(). I don't have Windows nor Mac to test bytes filenames on these systems.
msg74237 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2008年10月03日 10:43
> Which charset is used when you use bytes filename? It's the "ANSI" code page, which is a system-wide admin-modifiable indirection to some real code page (changing it requires a reboot). In the API, it's referred to as CP_ACP. It's also related to the "multi-byte" API, which has caused Mark Hammond to call the codec invoking it "mbcs" (IOW, "mbcs" is always the codec name for the file system encoding). The specific code page that CP_ACP denotes can be found with locale.getpreferredencoding(). Using that codec name (which might be e.g. "cp1252") is different from using "mbcs", as that goes through a regular (table-driven) Python codec. In particular, the Python codec will report errors, whereas the "mbcs" codec will find replacement characters.
msg74240 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2008年10月03日 11:31
> You should also support bytearray() in ntpath: > isinstance(path, (bytes, bytearray)) The most generic way of allowing all bytes-alike objects is to write: path = bytes(path) It raises a TypeError if `path` can't export a read-only buffer of contiguous bytes; also, it is a no-op if `path` is already a bytes object, so very cheap in the common case.
msg74241 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年10月03日 11:43
> The most generic way of allowing all bytes-alike objects is to write: > path = bytes(path) If you use that, any unicode may fails and the function will always return unicode. The goal is to get: func(bytes)->bytes func(bytearray)->bytes (or maybe bytearray, it doesn't matter) func(unicode)->unicode
msg74242 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2008年10月03日 11:52
Le vendredi 03 octobre 2008 à 11:43 +0000, STINNER Victor a écrit : > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > > > The most generic way of allowing all bytes-alike objects is to write: > > path = bytes(path) > > If you use that, any unicode may fails and the function will always return > unicode. The goal is to get: > func(bytes)->bytes > func(bytearray)->bytes (or maybe bytearray, it doesn't matter) > func(unicode)->unicode Then make it: path = path if isinstance(path, str) else bytes(path)
msg74246 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年10月03日 12:23
path=path is useless most of the code (unicode path), this code is faster if both cases (bytes or unicode)! if not isinstance(path, str): path = bytes(path) * a if b else c: unicode=0.756730079651; bytes=1.93071103096 * if test: path=...: unicode=0.681571006775; bytes=1.88843798637
msg74255 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2008年10月03日 16:11
I've committed sys.setfilesystemencoding as r66769. Declaring it as a documentation issue now. Not sure whether it should remain a release blocker; IMO, the documentation can still be produced after the release.
msg74256 - (view)	Author: Guido van Rossum (gvanrossum) * (Python committer)	Date: 2008年10月03日 16:34
Reducing priority to critical, it's just docs and tweaks from here. You should also support bytearray() in ntpath: > isinstance(path, (bytes, bytearray)) No, you shouldn't. I changed my mind on this several times and in the end figured it's good enough to just support bytes and str instances. Amaury: I've reviewed your patch and ran test_ntpath.py on a Linux box. I get this traceback: ====================================================================== ERROR: test_relpath (__main__.TestNtpath) ---------------------------------------------------------------------- Traceback (most recent call last): File "Lib/test/test_ntpath.py", line 188, in test_relpath tester('ntpath.relpath("a")', 'a') File "Lib/test/test_ntpath.py", line 22, in tester gotResult = eval(fn) File "<string>", line 1, in <module> File "/usr/local/google/home/guido/python/py3k/Lib/ntpath.py", line 530, in relpath start_list = abspath(start).split(sep) File "/usr/local/google/home/guido/python/py3k/Lib/ntpath.py", line 499, in abspath path = join(os.getcwd(), path) File "/usr/local/google/home/guido/python/py3k/Lib/ntpath.py", line 137, in join if b[:1] in seps: TypeError: 'in <string>' requires string as left operand, not bytes ---------------------------------------------------------------------- The fix is to change the fallback abspath to this code: def abspath(path): """Return the absolute version of a path.""" if not isabs(path): if isinstance(path, bytes): cwd = os.getcwdb() else: cwd = os.getcwd() path = join(cwd, path) return normpath(path) Once you fix that please check it in!
msg74257 - (view)	Author: Guido van Rossum (gvanrossum) * (Python committer)	Date: 2008年10月03日 16:35
Assigning to Amaury for Windows fix first.
msg74266 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2008年10月03日 18:40
Thanks for testing the non-Windows part of ntpath. Committed patch in r66777. Leaving the issue open: macpath.py should certainly be modified.
msg74267 - (view)	Author: Guido van Rossum (gvanrossum) * (Python committer)	Date: 2008年10月03日 18:51
Sorry Amaury, but there's another issue. test_ntpath now fails when run with -bb: ====================================================================== ERROR: test_expandvars (__main__.TestNtpath) ---------------------------------------------------------------------- Traceback (most recent call last): File "Lib/test/test_ntpath.py", line 151, in test_expandvars tester('ntpath.expandvars("$foo bar")', "bar bar") File "Lib/test/test_ntpath.py", line 10, in tester gotResult = eval(fn) File "<string>", line 1, in <module> File "/usr/local/google/home/guido/python/py3k/Lib/ntpath.py", line 344, in expandvars if c in ('\'', b'\''): # no expansion within single quotes BytesWarning: Comparison between bytes and string ====================================================================== ERROR: test_normpath (__main__.TestNtpath) ---------------------------------------------------------------------- Traceback (most recent call last): File "Lib/test/test_ntpath.py", line 120, in test_normpath tester("ntpath.normpath('A//////././//.//B')", r'A\B') File "Lib/test/test_ntpath.py", line 10, in tester gotResult = eval(fn) File "<string>", line 1, in <module> File "/usr/local/google/home/guido/python/py3k/Lib/ntpath.py", line 465, in normpath if comps[i] in ('.', '', b'.', b''): BytesWarning: Comparison between bytes and string ====================================================================== ERROR: test_relpath (__main__.TestNtpath) ---------------------------------------------------------------------- Traceback (most recent call last): File "Lib/test/test_ntpath.py", line 188, in test_relpath tester('ntpath.relpath("a")', 'a') File "Lib/test/test_ntpath.py", line 10, in tester gotResult = eval(fn) File "<string>", line 1, in <module> File "/usr/local/google/home/guido/python/py3k/Lib/ntpath.py", line 534, in relpath start_list = abspath(start).split(sep) File "/usr/local/google/home/guido/python/py3k/Lib/ntpath.py", line 504, in abspath return normpath(path) File "/usr/local/google/home/guido/python/py3k/Lib/ntpath.py", line 465, in normpath if comps[i] in ('.', '', b'.', b''): BytesWarning: Comparison between bytes and string
msg74268 - (view)	Author: Guido van Rossum (gvanrossum) * (Python committer)	Date: 2008年10月03日 18:54
FWIW, I don't see a need to change macpath.py -- it's only used for MacOS 9 and the occasional legacy app. OSX uses posixpath.py.
msg74270 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2008年10月03日 20:35
Committed r66779: test_ntpath now passes with the -bb option. It seems that the Windows buildbots do not set -bb.
msg74271 - (view)	Author: Guido van Rossum (gvanrossum) * (Python committer)	Date: 2008年10月03日 20:55
Thanks Amaury! On to Georg for doc tweaks. Summary: - all the os.path functions now work on bytes as well, on all platforms - only on Unix (but not OSX) do we recommend using bytes - os.getcwdu() no longer exists - os.getcwdb() returns bytes - os.listdir(<str>) skips undecodable entries (previously it returned a mixture of str and bytes instances) - open() accepts bytes as filename Stuff that didn't change but that you might want to mention: - all the syscalls in os support bytes args; readlink() and listdir() return bytes if the arg is bytes - getcwd() may raise UnicodeDecodeError Martin already documented sys.setfilesystemencoding().
msg74275 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2008年10月03日 21:21
I have a patch for macpath.py nonetheless. Tested on Windows (of course ;-) but all functions are pure text manipulation, except realpath(). It was much easier than ntpath.py. I also added tests for three functions which were not exercised at all.
msg74276 - (view)	Author: Benjamin Peterson (benjamin.peterson) * (Python committer)	Date: 2008年10月03日 21:50
Amaury, you're patch looks good.
msg74277 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2008年10月03日 21:57
Committed macpath.py in r66781.
msg74409 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年10月06日 23:03
Would it possible to close this issue since os.listdir() is fixed and many other related functions (posix, posixpath, ntpath, macpath, etc.) are also fixed? I propose to open new issues for new bugs since this issue becomes a little big long :) Eg. see new issues #4035 and #4036!
msg74412 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2008年10月06日 23:13
> Would it possible to close this issue since os.listdir() is fixed and > many other related functions (posix, posixpath, ntpath, macpath, etc.) > are also fixed? IIUC, these fixes are still not complete: they lack documentation changes. Of course, it would have been better if the original patches already contained the necessary documentation and test suite changes. See msg74271 for what Guido considers the lacking documentation; you may find that other aspects also need documentation. As for test cases: it seems that those got waived, in the hurry.
msg74414 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2008年10月06日 23:34
Le Tuesday 07 October 2008 01:13:22 Martin v. Löwis, vous avez écrit : > IIUC, these fixes are still not complete: they lack documentation > changes. (...) Of course, it would have been better if the original patches > already contained the necessary documentation and test suite changes. Most (or all) patches include new tests about bytes. Here is a patch for os.rst documentation about listdir(), getcwdb() and readlink(). > See msg74271 for what Guido considers the lacking documentation; > you may find that other aspects also need documentation. I wrote a long document about bytes for filenames but not only. I'm still waiting for some contributors or reviewers: http://wiki.python.org/moin/Python3UnicodeDecodeError > As for test cases: it seems that those got waived, in the hurry. Can you be more precise? Which tests have to be improved/rewritten?
msg74426 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2008年10月07日 07:12
> Most (or all) patches include new tests about bytes. Here is a patch for > os.rst documentation about listdir(), getcwdb() and readlink(). Thanks! Committed as r66829. I've added additional documentation in r66830, which should complete Guido's list of things to be documented. So the issue can be closed now. >> See msg74271 for what Guido considers the lacking documentation; >> you may find that other aspects also need documentation. > > I wrote a long document about bytes for filenames but not only. I'm still > waiting for some contributors or reviewers: > http://wiki.python.org/moin/Python3UnicodeDecodeError We should discuss that on python-dev, of course - the question is whether additional documentation patches are needed in response to this specific change. >> As for test cases: it seems that those got waived, in the hurry. > > Can you be more precise? Which tests have to be improved/rewritten? I was probably looking at the wrong patches (such as getcwd_bytes.patch, merge_os_getcwd_getcwdu.patch, etc); I now see that the final patch did have tests. I recommend that patches that get superseded by other patches are removed from the issue. The won't be deleted; it's still possible to navigate to them through the History at the bottom of the issue.

History
Date	User	Action	Args
2022年04月11日 14:56:35	admin	set	github: 47437
2008年10月07日 07:12:42	loewis	set	status: open -> closed
2008年10月07日 07:12:22	loewis	set	messages: + msg74426
2008年10月06日 23:34:57	vstinner	set	files: + library_os_doc.patch messages: + msg74414
2008年10月06日 23:13:21	loewis	set	messages: + msg74412
2008年10月06日 23:03:18	vstinner	set	messages: + msg74409
2008年10月03日 21:57:37	amaury.forgeotdarc	set	messages: + msg74277
2008年10月03日 21:50:09	benjamin.peterson	set	messages: + msg74276
2008年10月03日 21:21:24	amaury.forgeotdarc	set	files: + macpath.patch messages: + msg74275
2008年10月03日 20:55:27	gvanrossum	set	assignee: amaury.forgeotdarc -> georg.brandl messages: + msg74271
2008年10月03日 20:35:15	amaury.forgeotdarc	set	messages: + msg74270
2008年10月03日 18:54:35	gvanrossum	set	resolution: accepted messages: + msg74268
2008年10月03日 18:52:02	gvanrossum	set	assignee: amaury.forgeotdarc messages: + msg74267
2008年10月03日 18:40:33	amaury.forgeotdarc	set	assignee: amaury.forgeotdarc -> (no value) messages: + msg74266
2008年10月03日 16:35:33	gvanrossum	set	assignee: georg.brandl -> amaury.forgeotdarc messages: + msg74257
2008年10月03日 16:34:57	gvanrossum	set	priority: release blocker -> critical messages: + msg74256
2008年10月03日 16:11:02	loewis	set	assignee: loewis -> georg.brandl messages: + msg74255 components: + Documentation, - Library (Lib) nosy: + georg.brandl
2008年10月03日 12:23:29	vstinner	set	messages: + msg74246
2008年10月03日 11:52:56	pitrou	set	messages: + msg74242
2008年10月03日 11:43:52	vstinner	set	messages: + msg74241
2008年10月03日 11:31:37	pitrou	set	messages: + msg74240
2008年10月03日 10:43:51	loewis	set	messages: + msg74237
2008年10月03日 10:38:10	vstinner	set	messages: + msg74236
2008年10月03日 01:45:43	amaury.forgeotdarc	set	files: + win32-bytes-filenames.patch messages: + msg74222
2008年10月02日 21:04:26	bboissin	set	nosy: + bboissin
2008年10月02日 20:54:44	djc	set	nosy: + djc
2008年10月02日 19:35:20	loewis	set	messages: + msg74192
2008年10月02日 18:57:45	gvanrossum	set	assignee: gvanrossum -> loewis messages: + msg74186
2008年10月02日 17:10:54	vstinner	set	files: - python3_bytes_filename-2.patch
2008年10月02日 17:10:49	vstinner	set	files: + python3_bytes_filename-3.patch
2008年10月02日 16:43:48	gvanrossum	set	assignee: gvanrossum messages: + msg74173
2008年10月02日 12:56:28	barry	set	priority: deferred blocker -> release blocker
2008年09月30日 22:02:30	vstinner	set	files: + python3_bytes_filename-2.patch messages: + msg74101
2008年09月30日 17:04:27	gvanrossum	set	messages: + msg74083
2008年09月30日 15:21:16	loewis	set	files: + setfsenc.diff messages: + msg74080
2008年09月30日 01:08:11	vstinner	set	files: + python3_bytes_filename.patch messages: + msg74059
2008年09月29日 15:58:46	vstinner	set	files: + os_getcwdb.patch messages: + msg74032
2008年09月29日 14:45:27	draghuram	set	nosy: + draghuram
2008年09月29日 12:51:32	vstinner	set	files: + merge_os_getcwd_getcwdu.patch messages: + msg74027
2008年09月29日 05:11:59	loewis	set	messages: + msg74008
2008年09月29日 04:47:17	loewis	set	messages: + msg74007
2008年09月29日 04:45:30	loewis	set	messages: + msg74006
2008年09月29日 02:51:47	gvanrossum	set	messages: + msg74000
2008年09月29日 00:54:36	dlitz	set	messages: + msg73999
2008年09月28日 21:31:29	loewis	set	nosy: + loewis messages: + msg73992
2008年09月27日 15:12:05	dlitz	set	messages: + msg73926
2008年09月27日 14:50:29	vstinner	set	files: + getcwd_bytes.patch messages: + msg73925
2008年09月27日 11:49:42	vstinner	set	files: - invalid_filename.patch
2008年09月27日 11:49:38	vstinner	set	files: - filename.py
2008年09月27日 01:23:00	benjamin.peterson	set	messages: + msg73911
2008年09月27日 01:15:45	gvanrossum	set	messages: + msg73910
2008年09月27日 00:48:24	benjamin.peterson	set	files: + force_unicode.patch
2008年09月27日 00:48:16	benjamin.peterson	set	files: - force_unicode.patch
2008年09月27日 00:47:19	benjamin.peterson	set	files: + force_unicode.patch messages: + msg73909
2008年09月26日 22:20:09	barry	set	priority: release blocker -> deferred blocker
2008年09月24日 00:41:01	vstinner	set	messages: + msg73688
2008年09月24日 00:01:28	gvanrossum	set	messages: + msg73680
2008年09月23日 23:05:41	benjamin.peterson	set	files: + raise_decoding_errors.patch messages: + msg73678
2008年09月21日 23:48:43	benjamin.peterson	set	files: - oslistdir_string.patch
2008年09月21日 23:48:36	benjamin.peterson	set	files: + warn_at_the_end.patch
2008年09月21日 23:48:14	benjamin.peterson	set	files: + listdir_encoding_warning.patch messages: + msg73540
2008年09月21日 23:44:28	benjamin.peterson	set	files: - listdir_bytes_warning.patch
2008年09月21日 22:05:40	amaury.forgeotdarc	set	messages: + msg73535
2008年09月21日 21:44:35	benjamin.peterson	set	files: + listdir_bytes_warning.patch messages: + msg73534
2008年09月18日 07:39:29	HWJ	set	messages: + msg73362
2008年09月18日 05:43:10	barry	set	priority: deferred blocker -> release blocker
2008年09月09日 13:11:44	barry	set	priority: release blocker -> deferred blocker
2008年09月04日 12:03:25	zegreek	set	nosy: + zegreek messages: + msg72495
2008年08月26日 18:15:20	dlitz	set	messages: + msg71991
2008年08月26日 17:37:46	dlitz	set	nosy: + dlitz
2008年08月26日 17:14:06	amaury.forgeotdarc	link	issue3688 superseder
2008年08月22日 19:05:14	gvanrossum	set	messages: + msg71769
2008年08月22日 16:54:50	vstinner	set	files: + glob1_bytes.patch messages: + msg71757
2008年08月22日 16:53:49	vstinner	set	files: + fnmatch_bytes.patch messages: + msg71756
2008年08月22日 15:33:59	vstinner	set	files: + io_byte_filename.patch messages: + msg71752
2008年08月22日 15:19:41	vstinner	set	files: + posix_path_bytes.patch messages: + msg71751
2008年08月22日 14:53:29	gvanrossum	set	messages: + msg71749
2008年08月22日 14:21:43	vstinner	set	files: + invalid_filename.patch messages: + msg71748
2008年08月21日 23:59:09	gvanrossum	set	messages: + msg71705
2008年08月21日 23:35:51	benjamin.peterson	set	messages: + msg71700
2008年08月21日 23:31:58	gvanrossum	set	messages: + msg71699
2008年08月21日 20:55:33	vstinner	set	messages: + msg71680
2008年08月21日 16:17:46	gvanrossum	set	nosy: + gvanrossum messages: + msg71655
2008年08月21日 15:09:20	pitrou	set	messages: + msg71648
2008年08月21日 14:59:54	vstinner	set	messages: + msg71647
2008年08月21日 12:55:42	pitrou	set	messages: + msg71629
2008年08月21日 10:59:28	vstinner	set	files: + filename.py messages: + msg71624
2008年08月21日 08:51:11	pitrou	set	messages: + msg71615
2008年08月21日 08:20:41	vstinner	set	messages: + msg71612
2008年08月21日 03:07:23	barry	set	priority: deferred blocker -> release blocker
2008年08月20日 09:37:30	pitrou	link	issue3616 superseder
2008年08月20日 09:37:21	pitrou	set	nosy: + vstinner messages: + msg71525
2008年08月10日 00:17:16	benjamin.peterson	set	priority: critical -> deferred blocker messages: + msg70953
2008年08月09日 17:35:42	pitrou	set	priority: critical type: crash -> behavior messages: + msg70943 nosy: + pitrou
2008年06月25日 02:32:48	benjamin.peterson	set	title: os.walk - strange bug -> os.listdir can return byte strings
2008年06月24日 14:09:31	HWJ	set	messages: + msg68689
2008年06月24日 13:37:48	amaury.forgeotdarc	set	messages: + msg68688
2008年06月24日 13:21:04	benjamin.peterson	set	files: + oslistdir_string.patch keywords: + patch messages: + msg68686
2008年06月24日 13:12:52	amaury.forgeotdarc	set	status: closed -> open resolution: not a bug -> (no value) messages: + msg68685
2008年06月24日 13:02:56	benjamin.peterson	set	status: open -> closed resolution: not a bug messages: + msg68684 nosy: + benjamin.peterson
2008年06月24日 11:45:34	amaury.forgeotdarc	set	nosy: + amaury.forgeotdarc messages: + msg68679
2008年06月24日 10:28:25	HWJ	create

homepage