[Python-Dev] Type-converting functions, esp. unicode() vs. unistr()

2001年1月18日 21:17:36 -0500

> I hope you don't mind that i'm taking this over to python-dev,
> because it led me to discover a more general issue (see below).

No -- in fact I wanted to see this here! (My mail backlog seems to be
clearing -- or maybe it was only a temporary unclogging... :-)
> For the others on python-dev, here's the background: MAL was
> about to check in the unistr() function, described as follows:
>> > This patch adds a utility function unistr() which works just like
> > the standard builtin str() -- only that the return value will
> > always be a Unicode object.
> > 
> > The patch also adds a new object level C API PyObject_Unicode()
> > which complements PyObject_Str().
>> I responded:
> > Why are unistr() and unicode() two separate functions?
> > 
> > str() performs one task: convert to string. It can convert anything,
> > including strings or Unicode strings, numbers, instances, etc.
> > 
> > The other type-named functions e.g. int(), long(), float(), list(),
> > tuple() are similar in intent.
> > 
> > Why have unicode() just for converting strings to Unicode strings,
> > and unistr() for converting everything else to a Unicode string?
> > What does unistr(x) do differently from unicode(x) if x is a string?
>> MAL responded:
> > unistr() is meant to complement str() very closely. unicode()
> > works as constructor for Unicode objects which can also take
> > care of decoding encoded data. str() and unistr() don't provide
> > this capability but instead always assume the default encoding.
> > 
> > There's also a subtle difference in that str() and unistr() 
> > try the tp_str slot which unicode() doesn't. unicode()
> > supports any character buffer which str() and unistr() don't.
>> Okay, given this explanation, i still feel fairly confident
> that unicode() should subsume unistr(). Many of the other
> type-named functions try various slots:
>> int() looks for __int__
> float() looks for __float__
> long() looks for __long__
> str() looks for __str__
>> In testing this i also discovered the following:
>> >>> class Foo:
> ... def __int__(self):
> ... return 3
> ... 
> >>> f = Foo()
> >>> int(f)
> 3
> >>> long(f) 
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> AttributeError: Foo instance has no attribute '__long__'
> >>> float(f)
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> AttributeError: Foo instance has no attribute '__float__'
>> This is kind of surprising. How about:
>> int() looks for __int__
> float() looks for __float__, then tries __int__
> long() looks for __long__, then tries __int__
> str() looks for __str__
> unicode() looks for __unicode__, then tries __str__

For the numeric types this could perhaps be done by calling
PyNumber_Long() from PyNumber_Float(), calling PyNumber_Int() from
PyNumber_Long(). Complex is a bit of an exception -- there's no
PyNumber_Complex(), just because I felt that nobody would need it. :-)
> The extra parameter to unicode() is very similar to the extra
> parameter to int(), so i think there is a natural parallel here.

Makes sense.
> Hmm... what about the other types?
>> Wow!! __complex__ can produce a segfault!
>> >>> complex
> <built-in function complex>
> >>> class Foo:
> ... def __complex__(self): return 3
> ... 
> >>> Foo()
> <__main__.Foo instance at 0x81e8684>
> >>> f = _
> >>> complex(f)
> Segmentation fault (core dumped)
>> This happens because builtin_complex first retrieves and saves
> the PyNumberMethods of the argument (in this case, from the
> instance), then tries to call __complex__ (in this case, returning 3),
> and THEN coerces the result using nbr->nb_float if the result is
> not complex! (This calls the instance's nb_float method on the
> integer object 3!!)

Thanks! Fixed now in CVS.
> I think __complex__ should probably look for __complex__, then
> __float__, then __int__.

I make it call PyNumber_Float(), which could be made smarter as
explained above.
> One could argue for __list__, __tuple__, or __dict__, but that
> seems much weaker; the Pythonic way has always been to implement
> __getitem__ instead.

Yes -- since __list__ etc. aren't used, let's not add them.
> There is no built-in dict(); if it existed
> i suppose it would do the opposite of x.items(); again a weak
> argument, though i might have found such a function useful once
> or twice.

Yeah, it's not very common. Dict comprehensions anyone?
 d = {k:v for k,v in zip(range(10), range(10))} # :-)
> And that about covers the built-in types for data.

Thanks!
--Guido van Rossum (home page: http://www.python.org/~guido/)