String changing size on failure?

Wed Nov 1 16:34:20 EDT 2017

On 11/1/17 4:17 PM, MRAB wrote:
> On 2017年11月01日 19:26, Ned Batchelder wrote:
>>   From David Beazley 
>> (https://twitter.com/dabeaz/status/925787482515533830):
>>>>       >>> a = 'n'
>>       >>> b = 'ñ'
>>       >>> sys.getsizeof(a)
>>      50
>>       >>> sys.getsizeof(b)
>>      74
>>       >>> float(b)
>>      Traceback (most recent call last):
>>         File "<stdin>", line 1, in <module>
>>      ValueError: could not convert string to float: 'ñ'
>>       >>> sys.getsizeof(b)
>>      77
>>>> Huh?
>>> It's all explained in PEP 393.
>> It's creating an additional representation (UTF-8 + zero-byte 
> terminator) of the value and is caching that, so there'll then be the 
> bytes for 'ñ' and the bytes for the UTF-8 (0xC3 0xB1 0x00).
>> When the string is ASCII, the bytes of the UTF-8 representation is 
> identical to those or the original string, so it can share them.

That explains why b is larger than a to begin with, but it doesn't 
explain why float(b) is changing the size of b.
--Ned.