Message114847
| Author |
baikie |
| Recipients |
baikie, ezio.melotti, lemburg, loewis, vstinner |
| Date |
2010年08月24日.22:59:29 |
| SpamBayes Score |
1.8274237e-07 |
| Marked as misclassified |
No |
| Message-id |
<20100824225934.GA4097@dbwatson.ukfsn.org> |
| In-reply-to |
<4C72FF16.6040800@v.loewis.de> |
| Content |
> > It's about environments, not applications
>
> Still, my question remains. Is it a theoretical problem (i.e. one
> of your imagination), or a real one (i.e. one you observed in real
> life, without explicitly triggering it)? If real: what was the
> specific environment, and what was the specific host name?
Yes, I did reproduce the problem on my own system (Ubuntu 8.04).
No, it is not from a real application, nor do I know anyone with
their network configured like this (except possibly Dan "djbdns"
Bernstein: http://cr.yp.to/djbdns/idn.html ).
I reported this bug to save anyone who *is* in such an
environment from crashing applications and erroneous name
resolution.
> > That means that when a decoded hostname contains a non-ASCII
> > character which is not prohibited by IDNA/Nameprep, that string
> > will, when used in a subsequent call, not refer to the hostname
> > that was actually received, because it will be re-encoded using a
> > different codec.
>
> Again, I fail to see the problem in this. It won't happen in
> real life. However, if you worried that this could be abused,
> I think it should decode host names as ASCII, not as UTF-8.
> Then it will be symmetric again (IIUC).
That would be an improvement. The idea of the patches I posted
is to combine this with the existing surrogateescape mechanism,
which handles situations like this perfectly well. I don't see
how getting a UnicodeError is better than getting a string with
some lone surrogates in it. In fact, it was my understanding of
PEP 383 that it is in fact better to get the lone surrogates. |
|