Message70858
| Author |
janssen |
| Recipients |
gvanrossum, janssen, jimjjewett, loewis, mgiuca, orsenthil, pitrou, thomaspinckney3 |
| Date |
2008年08月07日.21:17:06 |
| SpamBayes Score |
0.0012973289 |
| Marked as misclassified |
No |
| Message-id |
<1218143828.6.0.619997562476.issue3300@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
My main fear with this patch is that "unquote" will become seen as
unreliable, because naive software trying to parse URLs will encounter
uses of percent-encoding where the encoded octets are not in fact UTF-8
bytes. They're just some set of bytes. A secondary concern is that it
will invisibly produce invalid data, because it decodes some
non-UTF-8-encoded string that happens to only use UTF-8-valid sequences
as the wrong string value.
Now, I have to confess that I don't know how common these use cases are
in actual URL usage. It would be nice if there was some organization
that had a large collection of URLs, and could provide a test set we
could run a scanner over :-).
As a workaround, though, I've sent a message off to Larry Masinter to
ask about this case. He's one of the authors of the URI spec. |
|