[Python-Dev] thoughts on the bytes/string discussion

Ian Bicking ianb at colorstudy.com
Sat Jun 26 00:26:20 CEST 2010


On Fri, Jun 25, 2010 at 4:02 PM, Guido van Rossum <guido at python.org> wrote:
> On Fri, Jun 25, 2010 at 1:43 PM, Glyph Lefkowitz
> > I'd like a version of 'decode' which would give me a type that was, in
> every
> > respect, unicode, and responded to all protocols exactly as other
> > unicode objects (or "str objects", if you prefer py3 nomenclature ;-))
> do,
> > but wouldn't actually copy any of that memory unless it really needed to
> > (for example, to pass to a C API that expected native wide characters),
> and
> > that would hold on to the original bytes so that it could produce them on
> > demand if encoded to the same encoding again. So, as others in this
> thread
> > have mentioned, the 'ABC' really implies some stuff about C APIs as well.
> > I'm not sure about the exact performance impact of such a class, which is
> > why I'd like the ability to implement it *outside* of the stdlib and see
> how
> > it works on a project, and return with a proposal along with some data.
> > There are also different ways to implement this, and other optimizations
> > (like ropes) which might be better.
> > You can almost do this today, but the lack of things like the
> hypothetical
> > "__rcontains__" does make it impossible to be totally transparent about
> it.
>> But you'd still have to validate it, right? You wouldn't want to go on
> using what you thought was wrapped UTF-8 if it wasn't actually valid
> UTF-8 (or you'd be worse off than in Python 2). So you're really just
> worried about space consumption. I'd like to see a lot of hard memory
> profiling data before I got overly worried about that.
>
It wasn't my profiling, but I seem to recall that Fredrik Lundh specifically
benchmarked ElementTree with all-unicode and sometimes-ascii-bytes, and
found that using Python 2 strs in some cases provided notable advantages. I
know Stefan copied ElementTree in this regard in lxml, maybe he also did a
benchmark or knows of one?
-- 
Ian Bicking | http://blog.ianbicking.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20100625/607e5b2e/attachment.html>


More information about the Python-Dev mailing list

AltStyle によって変換されたページ (->オリジナル) /