This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2014年03月26日 17:48 by zbysz, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| format-bytes.patch | martin.panter, 2014年12月16日 22:57 | review | ||
| format-str.patch | martin.panter, 2014年12月18日 05:24 | review | ||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 845 | merged | vstinner, 2017年03月27日 11:11 | |
| Messages (24) | |||
|---|---|---|---|
| msg214903 - (view) | Author: Zbyszek Jędrzejewski-Szmek (zbysz) * | Date: 2014年03月26日 17:48 | |
In Python 2, Struct.format used to be a str. In Python 3 it is bytes, which is unexpected.
Why do I expect .format to be a string:
- This format is pretty much the same as a "{}-format" - plain text
- according to documentation it is composed of things like characters from a closed set '<.=@hi...', a subset of ASCII,
- it is always called "format string" in the documentation
Why is this a problem:
- If I use a str format in constructor, I expect to get a str format,
- Comparisons are broken:
>>> struct.Struct('x').format == 'x'
False
>>> struct.Struct('x').format[0] == 'x'
False
- doctests are broken
>>> struct.Struct('x').format
'x' # in Python 2
b'x' # in Python 3
|
|||
| msg214905 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2014年03月26日 18:17 | |
I agree that's rather unfortunate. It would be backwards incompatible to change, though. |
|||
| msg214906 - (view) | Author: Zbyszek Jędrzejewski-Szmek (zbysz) * | Date: 2014年03月26日 18:21 | |
Maybe a flag param for the constructor? |
|||
| msg214907 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2014年03月26日 18:26 | |
I agree that the implementation does not match the documentation in this case. Especially the part about "the format string used to create this Struct object". I don't see what having a flag would buy you: it doesn't help you in writing 2/3 shared code. I think the best we can do here is a doc change. |
|||
| msg216655 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2014年04月17日 05:08 | |
This is closely related to Issue 16349. If format strings were explicitly allowed to be byte strings there would be less conflict, but documenting the data type of the "format" attribute is better than nothing. |
|||
| msg232768 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2014年12月16日 22:57 | |
It seems to me that the simplest fix is to document: 1. Struct.format attribute is a byte string 2. The input format strings for struct.pack(), Struct class, etc, are also allowed to be byte strings, for consistency (Issue 16349) Here is a patch that does that, and adds some simple test cases. |
|||
| msg232840 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2014年12月17日 23:56 | |
> It would be backwards incompatible to change, though. I'm in favor of breaking the compatibility with Python 3.4 and return the format as an Unicode string. |
|||
| msg232843 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2014年12月18日 00:17 | |
I originally assumed it would be a text string from the existing documentation, so changing the behaviour to match also seems reasonable |
|||
| msg232857 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2014年12月18日 05:24 | |
Here is a patch that changes over to a str() type. Is it safe to assume PyUnicode_AsUTF8() is null-terminated (like PyBytes_AS_STRING() is)? My documentation doesn’t say. |
|||
| msg232863 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2014年12月18日 08:13 | |
> Is it safe to assume PyUnicode_AsUTF8() is null-terminated? Yes, Python ensures that the string is null terminated. > (like PyBytes_AS_STRING() is) Yes, PyBytes_AS_STRING() also ends with a null byte. By the way, Unicode strings internally ends with a null character. |
|||
| msg232868 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年12月18日 10:04 | |
I think breaking the compatibility should be discussed on Python-Dev. Similar issue (and even worse) is issue8934. |
|||
| msg232880 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2014年12月18日 14:21 | |
A backward compatibility break would certainly need to be discussed, IMO. |
|||
| msg232955 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2014年12月20日 02:11 | |
I would like to see this and issue 8934 discussed as a usability bug. As far as I can tell, the current state of affairs an unintended by-product of a rushed effort to split the standard library to bytes apis and unicode apis. I don't see any reason that we should have to live with this forever. |
|||
| msg290500 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2017年03月25日 21:21 | |
A backwards-compatible way forward would be to preserve (and document) the "format" attribute as a byte string, and add a new attribute which is definitely a text string. Not sure of a good name; perhaps "Struct.text_format" or "format_str" is a start. |
|||
| msg290584 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年03月27日 11:14 | |
I created https://github.com/python/cpython/pull/845 to change struct.Struct.format type to str (Unicode). struct.Struct() accepts bytes and str format strings, so it's not really a backward incompatible change. It's just a minor enhancement to help development: $ ./python Python 3.7.0a0 (heads/master-dirty:b8a7daf, Mar 27 2017, 13:02:20) >>> print(struct.Struct('hi').format) hi Without the patch: haypo@selma$ python3 Python 3.5.2 (default, Sep 14 2016, 11:28:32) [GCC 6.2.1 20160901 (Red Hat 6.2.1-1)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import struct >>> print(struct.Struct('hi').format) b'hi' haypo@selma$ python3 -bb Python 3.5.2 (default, Sep 14 2016, 11:28:32) >>> import struct >>> print(struct.Struct('hi').format) Traceback (most recent call last): ... BytesWarning: str() on a bytes instance |
|||
| msg290595 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年03月27日 11:55 | |
This should be discussed on Python-Dev first. I already raised this issue on Python-Dev, but don't remember what is the result. |
|||
| msg290596 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2017年03月27日 11:55 | |
Hi Victor, I’m not sure about changing the data type. As Python 3 grows older, there is potentially more code being written that you break by fixing a bug like this. It is incompatible if you used to write
>>> print(struct.Struct('hi').format.decode())
hi
I have used this decode() trick in the past to build composite format strings; e.g.: <https://bugs.python.org/issue16349#msg174083>. If you change the data type this code will raise AttributeError. At a minimum you should acknowledge it in the "porting" section of What’s New.
Also, if you make this change, maybe update the module doc string. See the end of format-str.patch.
|
|||
| msg290601 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年03月27日 12:14 | |
Ok, I opened a thread on python-dev: https://mail.python.org/pipermail/python-dev/2017-March/147688.html Martin: "At a minimum you should acknowledge it in the "porting" section of What’s New." I wasn't sure if the change was worth it to be mentionned in What's New in Python 3.7. Ok, will do for the next round (I'm now waiting for more feedback on my python-dev thread and this issue.) |
|||
| msg292398 - (view) | Author: Xiang Zhang (xiang.zhang) * (Python committer) | Date: 2017年04月27日 04:41 | |
+1 for change bytes to str. But struct.Struct() accepts both bytes and str, maybe in future buffer objects. When it gets a bytes object, converting it to a str looks unnecessary to me, and as OP said, comparison (a theoretical use case) could still fail. Could we just leave what the user passes in? bytes(bytes-like) -> bytes, str -> str. This looks more natural to me. |
|||
| msg292409 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年04月27日 07:05 | |
After changing the type of Struct.format to str we perhaps should deprecate accepting bytes as format. Currently this can lead to emitting a BytesWarning.
$ ./python -Wa -b
>>> import struct
>>> struct.pack('I', 12345)
b'90\x00\x00'
>>> struct.pack(b'I', 12345)
__main__:1: BytesWarning: Comparison between bytes and string
__main__:1: BytesWarning: Comparison between bytes and string
b'90\x00\x00'
|
|||
| msg292411 - (view) | Author: Xiang Zhang (xiang.zhang) * (Python committer) | Date: 2017年04月27日 08:48 | |
The warnings are possible to remove I think... but deprecate bytes arguments sounds good. |
|||
| msg292558 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2017年04月29日 03:07 | |
I don’t think the API should be expanded to accept arbitrary bytes-like objects as format strings. Struct formats are strings of ASCII-compatible characters, but not arbitrary chunks of memory. I think the main question is whether it is okay to break compatibility (Victor’s pull request, or my format-str.patch), or whether there has to be a backwards-compatible deprecation of the existing bytes attribute. FWIW I am okay with breaking compatibility, since the main documentation already implies it should be a text string. |
|||
| msg296710 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月23日 13:11 | |
New changeset f87b85f80853c580b1c8bf78a51b0e9a25f6e1a7 by Victor Stinner in branch 'master': bpo-21071: struct.Struct.format type is now str (#845) https://github.com/python/cpython/commit/f87b85f80853c580b1c8bf78a51b0e9a25f6e1a7 |
|||
| msg296711 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月23日 13:14 | |
Ok, I changed struct.Struct.format type to str (Unicode string). If someone wants to modify the C code to use a PyUnicodeObject rather than a char*, feel free to propose a further change. Since the initial issue is fixed, I now close the issue. Thank you all for your feedback and reviews ;-) |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:00 | admin | set | github: 65270 |
| 2017年06月23日 13:14:01 | vstinner | set | status: open -> closed versions: + Python 3.7, - Python 3.4, Python 3.5 messages: + msg296711 resolution: fixed stage: needs patch -> resolved |
| 2017年06月23日 13:11:14 | vstinner | set | messages: + msg296710 |
| 2017年04月29日 03:07:47 | martin.panter | set | messages: + msg292558 |
| 2017年04月29日 02:46:51 | martin.panter | link | issue16349 dependencies |
| 2017年04月27日 08:48:52 | xiang.zhang | set | messages: + msg292411 |
| 2017年04月27日 07:05:22 | serhiy.storchaka | set | messages: + msg292409 |
| 2017年04月27日 04:41:57 | xiang.zhang | set | nosy:
+ xiang.zhang messages: + msg292398 |
| 2017年03月27日 12:14:46 | vstinner | set | messages: + msg290601 |
| 2017年03月27日 11:55:51 | martin.panter | set | messages: + msg290596 |
| 2017年03月27日 11:55:22 | serhiy.storchaka | set | messages: + msg290595 |
| 2017年03月27日 11:14:00 | vstinner | set | messages: + msg290584 |
| 2017年03月27日 11:11:36 | vstinner | set | pull_requests: + pull_request743 |
| 2017年03月25日 21:21:35 | martin.panter | set | messages: + msg290500 |
| 2014年12月20日 02:11:00 | rhettinger | set | nosy:
+ rhettinger messages: + msg232955 |
| 2014年12月19日 00:17:51 | Arfrever | set | nosy:
+ Arfrever |
| 2014年12月18日 14:21:01 | r.david.murray | set | messages: + msg232880 |
| 2014年12月18日 10:04:04 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg232868 |
| 2014年12月18日 08:13:13 | vstinner | set | messages: + msg232863 |
| 2014年12月18日 05:25:00 | martin.panter | set | files:
+ format-str.patch messages: + msg232857 |
| 2014年12月18日 00:17:19 | martin.panter | set | messages: + msg232843 |
| 2014年12月17日 23:56:28 | vstinner | set | nosy:
+ vstinner messages: + msg232840 |
| 2014年12月16日 22:57:16 | martin.panter | set | files:
+ format-bytes.patch keywords: + patch messages: + msg232768 |
| 2014年04月17日 05:08:45 | martin.panter | set | nosy:
+ martin.panter messages: + msg216655 |
| 2014年03月26日 18:26:38 | r.david.murray | set | assignee: docs@python components: + Documentation versions: - Python 3.1, Python 3.2, Python 3.3 nosy: + docs@python, r.david.murray messages: + msg214907 stage: needs patch |
| 2014年03月26日 18:21:12 | zbysz | set | messages: + msg214906 |
| 2014年03月26日 18:17:03 | benjamin.peterson | set | nosy:
+ benjamin.peterson messages: + msg214905 |
| 2014年03月26日 17:48:53 | zbysz | create | |