homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: binascii b2a functions accept strings (unicode) as data
Type: Stage:
Components: Extension Modules Versions: Python 3.0
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: barry, georg.brandl, loewis, pitrou, terry.reedy
Priority: release blocker Keywords: needs review, patch

Created on 2008年11月22日 00:41 by terry.reedy, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
reqbytes.diff loewis, 2008年11月30日 13:48
Messages (7)
msg76226 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2008年11月22日 00:41
Binascii b2a_xxx functions accept 'binary data' and return ascii-encoded
bytes. The corresponding a2b_xxx functions turn the ascii-encoded bytes
back to 'binary data' (bytes). If the binary data is bytes, these
should be inverses of each other.
Somewhat surprisingly to me (because the corresponding base64 module
functions raise "TypeError: expected bytes, not str") 3.0 strings
(unicode) are accepted as 'binary data', though they will not 'round-trip'.
Ascii chars almost do
>>> a='aaaa'
>>> c=b.b2a_base64(a)
>>> c
b'YWFhYQ==\n'
>>> d=b.a2b_base64(c)
>>> d
b'aaaa'
But general unicode chars generate nonsense.
>>> a='\u1000'
>>> c=b.b2a_base64(a)
>>> c
b'4YCA\n'
>>> d=b.a2b_base64(c)
>>> d
b'\xe1\x80\x80'
I also tried b2a_uu.
Is this a bug?
msg76233 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008年11月22日 08:16
I vote yes.
msg76628 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008年11月29日 22:48
It's not /exactly/ nonsense, it seems to assume an utf8 encoding pass is
necessary:
>>> b'\xe1\x80\x80'.decode('utf8') == '\u1000'
True
IMO, while accepting unicode strings instead of bytes for the a2b_xx
functions is understandable (because in practice only ASCII characters
are allowed), it is not acceptable for b2a_xx functions to accept
unicode strings instead of bytes.
In other words, it might/should be ok for
`binascii.a2b_base64('YWFh\n')` to return the same as
`binascii.a2b_base64('YWFh\n')` (that is, b'aaa'), but
`binascii.b2a_base64('aaa')` should raise a TypeError rather than
applying an utf8 encoding pass before doing the actual b2a encoding.
I think this must be fixed before 3.0 final, and is therefore a release
blocker.
msg76629 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008年11月29日 22:49
Hmm, I obviously meant:
[...] In other words, it might/should be ok for
`binascii.a2b_base64('YWFh\n')` to return the same as
`binascii.a2b_base64(b'YWFh\n')` (that is, b'aaa') [...]
msg76639 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008年11月30日 13:48
Here is a patches that fixes the problem.
Notice that this affects the email API; base64mime.body_encode now also
requires bytes (whereas quoprimime remains unchanged).
There are probably more functions that still incorrectly accept strings,
e.g. zlib.crc32.
msg76662 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2008年11月30日 20:14
Martin, the patch looks okay to me. I vote for applying it.
msg76724 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008年12月02日 06:00
Committed as r67472.
History
Date User Action Args
2022年04月11日 14:56:41adminsetgithub: 48637
2008年12月02日 06:00:34loewissetstatus: open -> closed
messages: + msg76724
2008年11月30日 20:14:38barrysetnosy: + barry
resolution: accepted
messages: + msg76662
2008年11月30日 13:58:56loewissetkeywords: + needs review
2008年11月30日 13:48:30loewissetfiles: + reqbytes.diff
nosy: + loewis
messages: + msg76639
keywords: + patch
2008年11月29日 22:49:42pitrousetmessages: + msg76629
2008年11月29日 22:48:03pitrousetpriority: release blocker
nosy: + pitrou
messages: + msg76628
2008年11月22日 08:16:40georg.brandlsetnosy: + georg.brandl
messages: + msg76233
2008年11月22日 00:41:18terry.reedycreate

AltStyle によって変換されたページ (->オリジナル) /