homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: PyUnicode_AsDecodedObject can only return unicode now
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: 28526 Superseder:
Assigned To: serhiy.storchaka Nosy List: Arfrever, benjamin.peterson, ezio.melotti, larry, lemburg, ncoghlan, python-dev, serhiy.storchaka, vstinner, xiang.zhang
Priority: normal Keywords: patch

Created on 2016年10月13日 04:30 by xiang.zhang, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
PyUnicode_AsDecodedObject-crash.patch serhiy.storchaka, 2016年10月13日 09:59 review
deprecate-str-to-str-coding-unicode-api.patch serhiy.storchaka, 2016年10月13日 10:00 review
deprecate-str-to-str-coding-unicode-api-2.patch serhiy.storchaka, 2016年10月25日 08:33 review
Pull Requests
URL Status Linked Edit
PR 552 closed dstufft, 2017年03月31日 16:36
Messages (17)
msg278545 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016年10月13日 04:30
PyUnicode_AsDecodedObject was added in f46d49e2e0f0 and became an API in 2284fa89ab08. It seems its intention is to return a Python object. But during evolution, with commits 5f11621a6f51 and 123f2dc08b3e, it can only return unicode now, becoming another version of PyUnicode_AsDecodedUnicode. Is this the wanted behaviour?
msg278546 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016年10月13日 05:50
Indeed. The only difference is that PyUnicode_AsDecodedUnicode fails for most encodings (except rot13), but PyUnicode_AsDecodedObject just crashes in debug build. It seems to me that these functions (as well as PyUnicode_AsEncodedUnicode) shouldn't exist it Python 3. None of these functions are documented. PyUnicode_AsDecodedObject emits Py3k warning in 2.7. PyUnicode_AsDecodedUnicode and PyUnicode_AsEncodedUnicode were added in Python 3 (2284fa89ab08), and the purpose of this is not clear to me. They work only with rot13, but general PyCodec_Decode and PyCodec_Encode can be used instead. Could you please explain Marc-André?
msg278550 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016年10月13日 08:02
PyUnicode_AsDecodedObject() and PyUnicode_AsEncodedObject() were meant as C API implementations of the unicode.decode() and unicode.encode() methods in Python2. Not having PyUnicode_AsDecodedObject() documented was likely an oversight on my part.
In Python2, unicode.decode() and unicode.encode() were more or less direct interfaces to the codec registry. In Python 2.7 this was changed to issue a warning for porting to Python 3.
In Python3, the methods were changed to only return unicode objects and to reflect this change without breaking the C API, the new PyUnicode_AsDecodedUnicode() and PyUnicode_AsEncodedUnicode() were added.
I guess the more recent changes simply didn't pay attention to this difference anymore and put restrictions on the output of PyUnicode_AsDecodedObject() and PyUnicode_AsEncodedObject() which were not originally intended, hence the crash you are seeing, Serhiy.
Going forward, C extensions in Python3 could indeed use the PyCodec_*() APIs directly.
msg278551 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016年10月13日 08:10
Shouldn't all these function be deprecated in favour of PyCodec_*() APIs?
msg278552 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016年10月13日 08:22
On 13.10.2016 10:10, Serhiy Storchaka wrote:
> Shouldn't all these function be deprecated in favour of PyCodec_*() APIs?
Not all of them, since you still want to have a C API for
unicode.encode(). PyUnicode_AsEncodedUnicode() would have
to stay.
As for the others, I don't know how much use they get.
msg278554 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016年10月13日 09:59
I propose following:
1) Fix a crash in PyUnicode_AsDecodedObject by removing unicode_result() in all maintained 3.x versions (starting from 3.4? or 3.3?).
2) Deprecate PyUnicode_AsDecodedObject, PyUnicode_AsDecodedUnicode and PyUnicode_AsEncodedUnicode in 3.6, make they always failing in 3.7 and remove them in future versions. They shouldn't be widely used since they are not documented, PyUnicode_AsDecodedObject already is deprecated in 2.7, and the only supported standard encoding is rot13.
msg278628 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016年10月14日 03:23
+1 for 2). Patch looks good.
msg279133 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016年10月21日 14:35
Marc-Andre, what are your thoughts about this?
msg279337 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016年10月24日 20:03
As I already mentioned, PyUnicode_AsEncodedUnicode() needs to stay, since it's the C API for unicode.encode(). The others can be deprecated.
msg279344 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016年10月25日 02:16
Marc-Andre, shouldn't the C API of unicode.encode() be PyUnicode_AsEncodedString instead of PyUnicode_AsEncodedUnicode now?
BTW Serhiy, how about PyUnicode_AsEncodedObject? Not see it in your deprecate list.
msg279363 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016年10月25日 06:53
On 25.10.2016 04:16, Xiang Zhang wrote:
> Marc-Andre, shouldn't the C API of unicode.encode() be PyUnicode_AsEncodedString instead of PyUnicode_AsEncodedUnicode now?
You're right. I got confused with all the slight variations.
> BTW Serhiy, how about PyUnicode_AsEncodedObject? Not see it in your deprecate list.
Let's see what we have:
PyUnicode_AsEncodedString(): encode to bytes (unicode.encode())
PyUnicode_AsEncodedUnicode(): encode to unicode (for e.g. rot13)
PyUnicode_AsEncodedObject(): encode to whatever the codec returns
codecs.encode() can be used for the last two.
msg279368 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016年10月25日 07:18
New changeset 71dce630dc02 by Serhiy Storchaka in branch '3.4':
Issue #28426: Fixed potential crash in PyUnicode_AsDecodedObject() in debug build.
https://hg.python.org/cpython/rev/71dce630dc02
New changeset 74569ecd67e4 by Serhiy Storchaka in branch '3.5':
Issue #28426: Fixed potential crash in PyUnicode_AsDecodedObject() in debug build.
https://hg.python.org/cpython/rev/74569ecd67e4
New changeset 6af1a26e655f by Serhiy Storchaka in branch '3.6':
Issue #28426: Fixed potential crash in PyUnicode_AsDecodedObject() in debug build.
https://hg.python.org/cpython/rev/6af1a26e655f
New changeset 1ab1fd00e9d6 by Serhiy Storchaka in branch 'default':
Issue #28426: Fixed potential crash in PyUnicode_AsDecodedObject() in debug build.
https://hg.python.org/cpython/rev/1ab1fd00e9d6 
msg279375 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016年10月25日 08:33
> BTW Serhiy, how about PyUnicode_AsEncodedObject? Not see it in your deprecate list.
Indeed. It is not such useless as other functions (that support only the rot13 encoding), but it can be replaced with either PyUnicode_AsEncodedString or PyCodec_Encode (both exists in 2.7).
Updated patch deprecates PyUnicode_AsEncodedObject too. Please make a review of warning messages, I'm not sure about the wording.
msg279378 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016年10月25日 08:52
LGTM. I can understand the wording.
msg279557 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016年10月27日 18:08
New changeset 15a494886c5a by Serhiy Storchaka in branch '3.6':
Issue #28426: Deprecated undocumented functions PyUnicode_AsEncodedObject(),
https://hg.python.org/cpython/rev/15a494886c5a
New changeset 50c28727d91c by Serhiy Storchaka in branch 'default':
Issue #28426: Deprecated undocumented functions PyUnicode_AsEncodedObject(),
https://hg.python.org/cpython/rev/50c28727d91c 
msg279559 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016年10月27日 18:17
Thanks Xiang.
msg279718 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) Date: 2016年10月30日 09:56
> New changeset 15a494886c5a by Serhiy Storchaka in branch '3.6':
> Issue #28426: Deprecated undocumented functions PyUnicode_AsEncodedObject(),
> https://hg.python.org/cpython/rev/15a494886c5a
s/that encode form str/that encode from str/ in Include/unicodeobject.h 
History
Date User Action Args
2022年04月11日 14:58:38adminsetgithub: 72612
2017年03月31日 16:36:09dstufftsetpull_requests: + pull_request850
2016年10月30日 09:56:24Arfreversetnosy: + Arfrever
messages: + msg279718
2016年10月27日 18:17:34serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg279559

stage: patch review -> resolved
2016年10月27日 18:08:18python-devsetmessages: + msg279557
2016年10月25日 08:52:27xiang.zhangsetmessages: + msg279378
2016年10月25日 08:42:57serhiy.storchakasetassignee: serhiy.storchaka
dependencies: + Use PyUnicode_AsEncodedString instead of PyUnicode_AsEncodedObject
2016年10月25日 08:33:16serhiy.storchakasetfiles: + deprecate-str-to-str-coding-unicode-api-2.patch

messages: + msg279375
versions: - Python 3.4, Python 3.5
2016年10月25日 07:18:52python-devsetnosy: + python-dev
messages: + msg279368
2016年10月25日 06:53:33lemburgsetmessages: + msg279363
2016年10月25日 02:16:50xiang.zhangsetmessages: + msg279344
2016年10月24日 20:03:06lemburgsetmessages: + msg279337
2016年10月21日 14:35:11serhiy.storchakasetmessages: + msg279133
2016年10月14日 03:23:46xiang.zhangsetmessages: + msg278628
2016年10月13日 10:00:21serhiy.storchakasetfiles: + deprecate-str-to-str-coding-unicode-api.patch
2016年10月13日 09:59:55serhiy.storchakasetfiles: + PyUnicode_AsDecodedObject-crash.patch

versions: + Python 3.4
keywords: + patch
nosy: + larry

messages: + msg278554
stage: patch review
2016年10月13日 08:22:18lemburgsetmessages: + msg278552
2016年10月13日 08:10:21serhiy.storchakasetmessages: + msg278551
2016年10月13日 08:02:15lemburgsetmessages: + msg278550
2016年10月13日 05:50:12serhiy.storchakasetnosy: + ezio.melotti, benjamin.peterson, serhiy.storchaka, ncoghlan
messages: + msg278546
2016年10月13日 04:30:06xiang.zhangcreate

AltStyle によって変換されたページ (->オリジナル) /