homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Wrong str->bytes conversion in Lib/encodings/idna.py
Type: enhancement Stage: test needed
Components: Library (Lib), Unicode Versions: Python 3.11
process
Status: open Resolution:
Dependencies: 7475 Superseder:
Assigned To: Nosy List: belopolsky, doerwalter, ezio.melotti, loewis, pitrou, r.david.murray, vstinner
Priority: normal Keywords:

Created on 2008年06月29日 01:03 by pitrou, last changed 2022年04月11日 14:56 by admin.

Messages (7)
msg68931 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008年06月29日 01:03
Lib/encodings/idna.py claims to do the following when `input` is a
string object (lines 183-184, and see comment line 178: "IDNA allows
decoding to operate on Unicode strings, too"):
 # Force to bytes
 input = bytes(input)
This is obviously wrong, lacking an encoding parameter. It doesn't seem
to be covered in the test suite, and I don't know what the proper
semantics should be, so I leave it to someone else to find a fix.
msg69219 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008年07月03日 18:05
Martin, you seem to be the author of that module.
msg124895 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010年12月30日 00:48
Martin's original code (r32301) was pretty clear:
 32301 loewis # IDNA allows decoding to operate on Unicode strings, too.
 32301 loewis if isinstance(input, unicode):
 32301 loewis labels = dots.split(input)
 32301 loewis else:
 32301 loewis # Must be ASCII string
 32301 loewis unicode(input, "ascii")
 32301 loewis labels = input.split(".")
but the py3k port, r55215, was clearly incomplete and the log message is explicit about it:
r55215 | guido.van.rossum | 2007年05月09日 19:40:37 -0400 (2007年5月09日) | 3 lines
Random modifications that slightly improve the chances of this not blowing up.
Walter will fix it for real.
I hope I picked the right Walter for the "nosy" list.
msg124899 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010年12月30日 01:53
Arguably, it is not a bug if codec's decode method rejects unicode strings with a TypeError. The 2.x implementation seems to allow decoding of ASCII-only unicode labels joined by arbitrary RFC 3490 separators. I am not sure what the use case for this behavior would be. In any case, supporting this would be a feature and it's acceptance would depend on the outcome of #7475.
msg124913 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年12月30日 10:55
> Arguably, it is not a bug if codec's decode method rejects unicode
> strings with a TypeError.
Agreed, but it would be better if it did so deliberately and explicitly, rather than as a result of a bogus forward-port ;)
msg144682 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011年09月30日 10:20
I agree that the codec shouldn't "decode" unicode strings. However, the operation performed is still meaningful: users may type ACE (ascii-compatibly-encoded) DNS names into a user interface, and the application may then represent this as a "proper" Unicode name.
So I propose these changes:
- remove support for bytes in codec, but only so for 3.3 (it's actually no change in behavior, since it will continue to raise TypeErrors)
- add a function decode_idna to the module, for users that wish to un-IDNA string objects.
msg144687 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011年09月30日 11:48
+1. decode_idna is likely to be useful to the email package.
History
Date User Action Args
2022年04月11日 14:56:35adminsetgithub: 47482
2021年11月24日 16:20:06iritkatrielsetversions: + Python 3.11, - Python 3.4
2013年02月23日 06:46:54ezio.melottisettype: behavior -> enhancement
versions: + Python 3.4, - Python 3.3
2011年09月30日 11:48:24r.david.murraysetmessages: + msg144687
2011年09月30日 10:35:58vstinnersetnosy: + vstinner
2011年09月30日 10:20:04loewissetmessages: + msg144682
2011年09月29日 23:05:12ezio.melottisetnosy: + ezio.melotti
2011年07月19日 12:50:40pitrousetnosy: + r.david.murray
2010年12月30日 10:55:12pitrousetmessages: + msg124913
2010年12月30日 01:53:47belopolskysetdependencies: + codecs missing: base64 bz2 hex zlib hex_codec ...
messages: + msg124899
versions: + Python 3.3, - Python 3.1
2010年12月30日 00:48:10belopolskysetnosy: + belopolsky, doerwalter
messages: + msg124895
2009年05月16日 20:34:11ajaksu2setpriority: normal
stage: test needed
versions: + Python 3.1, - Python 3.0
2008年07月03日 18:05:46pitrousetnosy: + loewis
messages: + msg69219
2008年06月29日 01:03:31pitroucreate

AltStyle によって変換されたページ (->オリジナル) /