homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: PyArg_Parse*(): "z" should not accept bytes
Type: Stage:
Components: Interpreter Core Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: lemburg, loewis, vstinner
Priority: normal Keywords: patch

Created on 2010年06月08日 21:23 by vstinner, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
pyarg_z_no_bytes-2.patch vstinner, 2010年06月13日 20:20
Messages (12)
msg107351 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年06月08日 21:23
"z" is supposed to be the same format than "s" but accepts None.... Except that "z" does also accept bytes. I suppose that "s" was "fixed" between Python2 and Python3, but "z" is not fixed yet.
I think that it can be fixed in Python 3.2 without any deprecation process.
msg107357 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年06月08日 22:12
Hum, the patch doesn't update the documentation (at least Porting section of the What's new in Python 3.2).
msg107384 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010年06月09日 11:06
"z" should not accept bytes... why not ?
"z" is the same as "s" with the addition that passing None
as parameter will result in the pointer to get set to NULL.
"s" accepts bytes via the buffer interface, so why should
"z" behave differently ?
If a function should only accept Unicode or None, then "Z"
should be used instead.
msg107388 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年06月09日 12:02
Le mercredi 09 juin 2010 13:06:53, vous avez écrit :
> "s" accepts bytes via the buffer interface
No. "s*" and "s#" do accept any buffer compatible object (including bytes and 
bytearray), but "s" doesn't.
I fixed recently the documentation about that.
msg107393 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010年06月09日 12:23
STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
> Le mercredi 09 juin 2010 13:06:53, vous avez écrit :
>> "s" accepts bytes via the buffer interface
> 
> No. "s*" and "s#" do accept any buffer compatible object (including bytes and 
> bytearray), but "s" doesn't.
> 
> I fixed recently the documentation about that.
Hmm, that sounds like an oversight to me. Why should "s" not accept
bytes when "s#" and "s*" do ?
The only difference between "s" and "s#" is that you work with
NUL-terminated strings as opposed to binary or text data with embedded
NULs. Even Unicode objects can contain embedded NULs, so from that
perspective there's no difference.
msg107579 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年06月11日 20:12
Some examples of functions using "s" format:
 * str.encode(encoding, errors), bytes.decode(encoding, errors): both arguments have to be unicode strings
 * compile(source, filename, mode, ...): filename and mode have to be unicode strings
 * crypt.crypt(word, salt): both arguments have to be unicode strings
I think that crypt() should also accept bytes, but not str.encode() nor bytes.decode().
Some examples of functions using "z" format:
 * _locale.bindtextdomain(domain, dirname): dirname uses "z" format and so accepts str, bytes or buffer compatible object. It should use PyUnicode_FSConverter() instead. But I agree that bytes is welcomed here.
 * readline.(write_history_file|read_init_file|read_history_file) functions do use "z" to parse a filename. PyUnicode_FSConverter() would also be better, but in this case "z" is better than "s" :-)
I don't know why "s" and "z" are different about bytes, but it will be difficult to change it without changing a lot ot code (all functions using these formats). I tried to reject types different than str for "z": most tests of the test suite fail. I tried to accept bytes for "s" format: "unicode".encode(b'abc') does segfault.
msg107583 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年06月11日 20:47
> I tried to reject types different than str for "z": 
> most tests of the test suite fail
Wait, what? No. I modified the wrong line of code :-) The whole test suite pass without any error if "z" doesn't accept bytes anymore.
msg107600 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年06月11日 23:00
I patched readline (r81918) and locale (r81914) modules to use PyUnicode_FSConverter(). They don't use "z" format anymore to parse a filename.
I checked last functions using "z" format (not "z#" or "z*", only "z") and I think that it's ok ("z" can be patched to reject bytes). You can get the full list using:
grep -n 'PyArg_Parse[^(]*([^"]*"[^";:]*z[^#*]' */*.c
--
About the difference between s/z and s*/s#/z*/z#: I don't understand why they are different, but I think that it's too late to change that. Too much functions (and third party modules?) suppose that s*/s#/z*/z# supports buffer compatible objects.
msg107748 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年06月13日 20:20
Update the patch to update also the new tests that I added in r81973 (to fix #8592).
msg108152 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年06月18日 23:46
@lemburg: So what is your opinion on this issue?
msg108311 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010年06月21日 19:57
STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
> @lemburg: So what is your opinion on this issue?
You're probably right: it's too late to change "s" to accept buffer
interface compatible objects as well.
In that case, symmetry is more important, so +1 on rejecting bytes
for "z" as well.
Please add a note to the NEWS file for this change.
msg108554 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年06月24日 22:11
Patch commited to 3.2 (r82200), blocked in 3.1 (r82201).
History
Date User Action Args
2022年04月11日 14:57:02adminsetgithub: 53195
2010年06月24日 22:11:00vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg108554
2010年06月21日 19:57:37lemburgsetmessages: + msg108311
2010年06月18日 23:46:33vstinnersetmessages: + msg108152
2010年06月13日 20:20:19vstinnersetfiles: - pyarg_z_no_bytes.patch
2010年06月13日 20:20:14vstinnersetfiles: + pyarg_z_no_bytes-2.patch

messages: + msg107748
2010年06月11日 23:39:25vstinnerlinkissue8951 dependencies
2010年06月11日 23:00:10vstinnersetmessages: + msg107600
2010年06月11日 20:47:20vstinnersetmessages: + msg107583
2010年06月11日 20:12:28vstinnersetmessages: + msg107579
2010年06月09日 17:25:34pitrousetnosy: + loewis
2010年06月09日 12:23:57lemburgsetmessages: + msg107393
2010年06月09日 12:02:04vstinnersetmessages: + msg107388
2010年06月09日 11:06:51lemburgsetmessages: + msg107384
2010年06月09日 10:37:17vstinnersetnosy: + lemburg
2010年06月08日 22:12:03vstinnersetmessages: + msg107357
2010年06月08日 21:24:15vstinnersetfiles: + pyarg_z_no_bytes.patch
keywords: + patch
2010年06月08日 21:23:21vstinnercreate

AltStyle によって変換されたページ (->オリジナル) /