Issue 13641: decoding functions in the base64 module could accept unicode strings

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/57850

classification

Title:	decoding functions in the base64 module could accept unicode strings
Type:	enhancement	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.3

process

Dependencies:	Superseder:
Status:	closed	Resolution:	fixed
Assigned To:	Nosy List:	Anthony.Kong, berker.peksag, catalin.iacob, eric.araujo, gvanrossum, petri.lehtinen, pitrou, poq, python-dev, r.david.murray
Priority:	normal	Keywords:	easy, patch

Created on 2011年12月20日 13:04 by pitrou, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
issue13641_v1.diff	berker.peksag, 2011年12月31日 23:50	review
issue13641_v2_with_tests.diff	berker.peksag, 2012年01月04日 23:21	review
issue13641_v3_with_tests.diff	berker.peksag, 2012年01月16日 21:11	review
issue13641-alternative-v1.patch	catalin.iacob, 2012年02月16日 20:45	review
issue13641-alternative-v2.patch	catalin.iacob, 2012年02月19日 18:17	review

Messages (20)
msg149912 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2011年12月20日 13:04
Similarly to #13637 for the binascii module, the decoding functions in the base64 module could accept ASCII-only unicode strings.
msg150445 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2012年01月01日 22:15
Thanks for the patch, Berker. It seems a bit too simple, though. You should add some tests in Lib/test/test_base64.py and run them (using "./python -m test -v test_base64"), this will allow you to see if your changes are correct.
msg150633 - (view)	Author: Berker Peksag (berker.peksag) * (Python committer)	Date: 2012年01月04日 23:21
Hi Antoine, I added some tests for b64decode function. Also, I wrote some tests for b32decode and b16decode functions and failed. I think my patch is not working for b32decode and b16decode functions. I'll dig into code and try to find a way. Thanks!
msg151646 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2012年01月19日 17:54
Thanks for the updated patch! Two comments: - I see no tests for map01 and altchars being passed as an str, is this supported by the patch or am I reading it wrong? - apparently b16decode is not tackled, is it deliberate? Thanks again.
msg153499 - (view)	Author: Catalin Iacob (catalin.iacob) *	Date: 2012年02月16日 20:45
Attached alternative patch with a different approach: on input, strings are encoded as bytes and the rest of the code proceeds as before. All existing tests for bytes now test for strings as well and there is a new test for strings with non ASCII characters. Berker's patch was more intrusive and forgot to allow strings in _translate, leading to failures if altchars or map01 were used.
msg153500 - (view)	Author: R. David Murray (r.david.murray) * (Python committer)	Date: 2012年02月16日 21:14
Um. I'm inclined to think that #13637 was a mistake. Functions that accept bytes and return bytes and also accept string and return string seem uncontroversial. However, accepting bytes or string and returning bytes is not an obviously good idea, and IMO at least merits some discussion. In fact, I thought it had been discussed, specifically in the context of the b2a/a2b functions, and been rejected.
msg153501 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2012年02月16日 21:17
> However, accepting bytes or string and returning bytes is not an > obviously good idea, and IMO at least merits some discussion. Why? "a" in "a2b" means ASCII, and unicode is as valid a container for ASCII text as bytes is.
msg153504 - (view)	Author: (poq)	Date: 2012年02月16日 21:40
FWIW, I was surprised by the return type of b64encode when I first used it in Python 3. It seems to me that b64encode turns binary data into text and thus intuitively should take bytes and return str. Similarly it seems intuitive to me for b64decode to take str as input and return bytes, as it turns text back into binary data.
msg153505 - (view)	Author: R. David Murray (r.david.murray) * (Python committer)	Date: 2012年02月16日 22:14
OK, I skimmed the thread I was remembering, and while it was discussing str->str and bytes->bytes primarily, the only pronouncement I could find was that functions should not accept a mix of bytes and string. So I guess I withdraw my objection, although it still makes me a bit uncomfortable.
msg153604 - (view)	Author: Catalin Iacob (catalin.iacob) *	Date: 2012年02月17日 21:52
My current patch allows mixing of bytes and str for the data to be decoded and the altchars or map01 parameter. Given David's observation in msg153505 I'll update the patch to require that both the data and altchars/map01 have the same type.
msg153713 - (view)	Author: Catalin Iacob (catalin.iacob) *	Date: 2012年02月19日 18:17
Attached v2 of patch where mixing str and binary data for altchars or map01 raises TypeError. I also added a note for each of the changed functions that it also accepts strings (but didn't also update the docstrings). When writing the docs, the new functionality seemed hard to describe; maybe that means this issue only complicates things and is not worth it, or maybe it just means I don't have experience at writing docs. But, regardless of having worked at a patch, I have to admit that I'm also not 100% sure this issue is a good idea. I do think that either both this issue and #13637 should be accepted or both rejected.
msg153714 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2012年02月19日 18:21
I think trying to prevent mixed argument types is completely overkill. There's no ambiguity since they all have to be ASCII anyway. So I would prefer to commit issue13641-alternative-v1.patch
msg153757 - (view)	Author: R. David Murray (r.david.murray) * (Python committer)	Date: 2012年02月20日 02:13
OK' I'm back to being 100% on the side of rejecting both of these changes. ASCII is not unocode, it is bytes. You can decode it to unicode but it is not unicode. Those transformations operate bytes to bytes, not bytes to unicode. We made the bytes unicode separation to avoid the problem where you have a working program that unexpectedly gets non ASCII input and blows up with a unicode error. IMO these patches are reintroducing that problem. The programer should have to explicitly encode to ASCII if they are inadvisedly workimg with it in a string as part of a wire protocol (why else would they be using these transforms).
msg153759 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2012年02月20日 02:21
> OK' I'm back to being 100% on the side of rejecting both of these > changes. ASCII is not unocode, it is bytes. You can decode it to > unicode but it is not unicode. Those transformations operate bytes to > bytes, not bytes to unicode. ASCII is just a subset of the unicode character set. > We made the bytes unicode separation to avoid the problem where you > have a working program that unexpectedly gets non ASCII input and > blows up with a unicode error. How is blowing up with a unicode error worse than blowing up with a ValueError? Both indicate wrong input. At worse the code could catch UnicodeError and re-raise it as ValueError, but I don't see the point. > The programer should have to explicitly encode to ASCII if they are > inadvisedly workimg with it in a string as part of a wire protocol > (why else would they be using these transforms). Inadvisedly? There are many situations where you can have base64 data in some unicode strings.
msg153793 - (view)	Author: Roundup Robot (python-dev) (Python triager)	Date: 2012年02月20日 18:33
New changeset c760bd844222 by Antoine Pitrou in branch 'default': Issue #13641: Decoding functions in the base64 module now accept ASCII-only unicode strings. http://hg.python.org/cpython/rev/c760bd844222
msg153794 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2012年02月20日 18:34
I've committed issue13641-alternative-v1.patch. I really think practicality beats purity here and, furthermore, there's no associated danger (non-ASCII data is rejected both as bytes and str).
msg153827 - (view)	Author: R. David Murray (r.david.murray) * (Python committer)	Date: 2012年02月21日 01:06
Non-ascii binary data should not be being rejected unless validate is true. So what are you going to do with non-ascii-range unicode in that case? Ignore it as well? That can't be right. I believe this should be discussed on python-dev.
msg153828 - (view)	Author: R. David Murray (r.david.murray) * (Python committer)	Date: 2012年02月21日 01:10
I disagree with this commit. Reopening pending discussion on python-dev.
msg153830 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2012年02月21日 01:23
> Non-ascii binary data should not be being rejected unless validate > is true. So what are you going to do with non-ascii-range unicode in > that case? Ignore it as well? That can't be right. It's not ignored, it raises ValueError. Since the common case it to feed valid (not invalid) baseXX data to these functions, that's a very benign limitation.
msg153847 - (view)	Author: Guido van Rossum (gvanrossum) * (Python committer)	Date: 2012年02月21日 04:18
Aside: I, too, at first thought this would be a bad idea because it brings back the Python 2 issue of accepting some but not all Unicode strings. But then I realized that by their nature these functions only accepts a very specific set of characters -- so the restriction to (a subset of) ASCII is intrinsic to the functionality, and there is no possibility of confusion. If anything, accepting bytes is more likely to be confusing (they could be EBCDIC! :-). So no objection here. And a slight preference for ValueError.

History
Date	User	Action	Args
2022年04月11日 14:57:24	admin	set	github: 57850
2012年02月25日 15:41:16	r.david.murray	set	status: open -> closed
2012年02月21日 04:18:15	gvanrossum	set	nosy: + gvanrossum messages: + msg153847
2012年02月21日 01:23:45	pitrou	set	messages: + msg153830
2012年02月21日 01:10:03	r.david.murray	set	status: closed -> open messages: + msg153828
2012年02月21日 01:06:41	r.david.murray	set	messages: + msg153827
2012年02月20日 18:34:54	pitrou	set	status: open -> closed resolution: fixed stage: patch review -> resolved
2012年02月20日 18:34:40	pitrou	set	messages: + msg153794
2012年02月20日 18:33:48	python-dev	set	nosy: + python-dev messages: + msg153793
2012年02月20日 02:21:45	pitrou	set	messages: + msg153759
2012年02月20日 02:13:31	r.david.murray	set	messages: + msg153757
2012年02月19日 18:21:54	pitrou	set	messages: + msg153714
2012年02月19日 18:17:17	catalin.iacob	set	files: + issue13641-alternative-v2.patch messages: + msg153713
2012年02月17日 21:52:48	catalin.iacob	set	messages: + msg153604
2012年02月16日 22:14:10	r.david.murray	set	messages: + msg153505
2012年02月16日 21:40:53	poq	set	nosy: + poq messages: + msg153504
2012年02月16日 21:17:49	pitrou	set	messages: + msg153501
2012年02月16日 21:14:32	r.david.murray	set	nosy: + r.david.murray messages: + msg153500
2012年02月16日 20:45:57	catalin.iacob	set	files: + issue13641-alternative-v1.patch nosy: + catalin.iacob messages: + msg153499
2012年01月19日 17:54:21	pitrou	set	messages: + msg151646 stage: patch review
2012年01月16日 21:11:29	berker.peksag	set	files: + issue13641_v3_with_tests.diff
2012年01月04日 23:21:55	berker.peksag	set	files: + issue13641_v2_with_tests.diff messages: + msg150633
2012年01月01日 22:15:40	pitrou	set	messages: + msg150445
2011年12月31日 23:50:05	berker.peksag	set	files: + issue13641_v1.diff nosy: + berker.peksag keywords: + patch
2011年12月30日 20:50:55	eric.araujo	set	nosy: + eric.araujo
2011年12月28日 10:59:10	Anthony.Kong	set	nosy: + Anthony.Kong
2011年12月28日 07:22:14	petri.lehtinen	set	nosy: + petri.lehtinen
2011年12月20日 13:04:47	pitrou	create

homepage