[Python-Dev] urllib.quote and unquote - Unicode issues

Guido van Rossum guido at python.org
Wed Jul 30 20:00:09 CEST 2008


On Wed, Jul 30, 2008 at 10:33 AM, Bill Janssen <janssen at parc.com> wrote:
>> It looks like all other APIs in the Py3k version of
>> urllib treat URLs as text.
>> The URL is text, a string of ASCII characters. We're just talking
> about urllib.quote() and urllib.unquote(), which are there to support
> the text-ization of binary values, and the de-text-ization.
>>> I think that would break too much code, without a good way to
>> automatically fix it.
>> You'd rather break Python? Somehow I don't think so.

Let's stop the rhetoric, or I'll have to beat you over the head with
the Zen of Python. :-)
urllib is not meant as a reference implementation of any RFC; it is
meant as a practical tool for Python users writing web apps (servers
and clients).
> Here's the signature I'm proposing:
>> quote() -- takes string or bytes, and produces string.
>> If input is a string, looks to optional "encoding" parameter to
> determine character set encoding to use to transform it to byte before
> quoting it. If "encoding" is not specified, defaults to UTF-8.

No contest here, since it supports the common string->string use case.
E.g. quote('a%b') returns 'a%25b'.
> unquote() -- takes string, produces bytes or string
>> If optional "encoding" parameter is specified, decodes bytes with
> that encoding and returns string. Otherwise, returns bytes.

The default of returning bytes will break almost all uses. Most code
will uses the unquoted result as a text string, not as bytes -- e.g. a
server has to unquote the values it receives from a form (whether POST
or GET), but almost always the unquoted values are text, e.g.
someone's name or address, or a draft email message.
(Aside: I dislike functions that have a different return type based on
the value of a parameter.)
-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list

AltStyle によって変換されたページ (->オリジナル) /