RE Module Performance

Wed Jul 24 09:40:55 EDT 2013

Le samedi 13 juillet 2013 01:13:47 UTC+2, Michael Torrie a écrit :
> On 07/12/2013 09:59 AM, Joshua Landau wrote:
>> > If you're interested, the basic of it is that strings now use a
>> > variable number of bytes to encode their values depending on whether
>> > values outside of the ASCII range and some other range are used, as an
>> > optimisation.
>>>> Variable number of bytes is a problematic way to saying it. UTF-8 is a
>> variable-number-of-bytes encoding scheme where each character can be 1,
>> 2, 4, or more bytes, depending on the unicode character. As you can
>> imagine this sort of encoding scheme would be very slow to do slicing
>> with (looking up a character at a certain position). Python uses
>> fixed-width encoding schemes, so they preserve the O(n) lookup speeds,
>> but python will use 1, 2, or 4 bytes per every character in the string,
>> depending on what is needed. Just in case the OP might have
>> misunderstood what you are saying.
>>>> jmf sees the case where a string is promoted from one width to another,
>> and thinks that the brief slowdown in string operations to accomplish
>> this is a problem. In reality I have never seen anyone use the types of
>> string operations his pseudo benchmarks use, and in general Python 3's
>> string behavior is pretty fast. And apparently much more correct than
>> if jmf's ideas of unicode were implemented.

------
Sorry, you are not understanding Unicode. What is a Unicode
Transformation Format (UTF), what is the goal of a UTF and
why it is important for an implementation to work with a UTF.
Short example. Writing an editor with something like the
FSR is simply impossible (properly).
jmf