Message71073
| Author |
gvanrossum |
| Recipients |
gvanrossum, janssen, jimjjewett, lemburg, loewis, mgiuca, orsenthil, pitrou, thomaspinckney3 |
| Date |
2008年08月12日.23:43:45 |
| SpamBayes Score |
1.0541568e-12 |
| Marked as misclassified |
No |
| Message-id |
<ca471dc20808121643q22abc4e5mb04b416531f28ebe@mail.gmail.com> |
| In-reply-to |
<1218554817.69.0.715071509169.issue3300@psf.upfronthosting.co.za> |
| Content |
> Matt Giuca <matt.giuca@gmail.com> added the comment:
> By the way, what is the current status of this bug? Is anybody waiting
> on me to do anything? (Re: Patch 9)
I'll be reviewing it today or tomorrow. From looking at it briefly I
worry that the implementation is pretty slow -- a method call for each
character and a map() call sounds pretty bad.
> To recap my previous list of outstanding issues raised by the review:
>
>> Should unquote accept a bytes/bytearray as well as a str?
> Currently, does not. I think it's meaningless to do so (and how to
> handle >127 bytes, if so?)
The bytes > 127 would be translated as themselves; this follows
logically from how stuff is parsed -- %% and %FF are translated,
everything else is not. But I don't really care, I doubt there's a
need.
>> Lib/email/utils.py:
>> Should encode_rfc2231 with charset=None accept strings with non-ASCII
>> characters, and just encode them to UTF-8?
> Currently does. Suggestion to restrict to ASCII on the review tracker;
> simple fix.
I think I agree with that comment; it seems wrong to return UTF8
without setting that in the header. The alternative would be to
default charset to utf8 if there are any non-ASCII chars in the input.
I'd be okay with that too.
>> Should quote raise a TypeError if given a bytes with encoding/errors
>> arguments? (Motivation: TypeError is what you usually raise if you
>> supply too many args to a function).
> Resolved. Raises TypeError.
>
>> Lib/urllib/parse.py:
>> (As discussed above) Should quote accept safe characters outside the
>> ASCII range (thereby potentially producing invalid URIs)?
> Resolved? Implemented, but too messy and not worth it just to produce
> invalid URIs, so NOT in patch.
Agreed, safe should be ASCII chars only.
> That's only two very minor yes/no issues remaining. Please comment.
I believe patch 9 still has errors defaulting to strict for quote().
Weren't you going to change that?
Regarding using UTF-8 as the default encoding, I still think this the
right thing to do -- while the tables shown by Bill indicate that
there's still a lot of Latin-1 out there, UTF-8 is definitely gaining
on it, and I expect that Python apps, especially Py3k apps, are much
more likely to follow (and hopefully reinforce! :-) this trend than to
lag behind. |
|