[Python-3000] string API growth [was: Re: PEP 3138- String representation in Python 3000]

Thu May 15 02:58:15 CEST 2008

Jim Jewett writes:
 > Maybe I'm missing something, but it seems to me that there are only a
 > few logical combinations; 
There are lots of logical combinations, but most of them fall into
"general transform", is that what you mean?
 > if the below is wrong, maybe that is one
 > reason unicode seems more complex than it should.
 > 
 > Encoding: str -> ByteString
 > (staticmethod) BytesString.encode(my_string, encoding=?)
 > ==
 > my_string.encode(encoding=?)
 > 
 > Decoding: ByteString -> str
 > my_bytes.decode(encoding=?)
 > ==
 > (staticmethod) str.decode(my_bytes, encoding=?)
+1
 > General Transforming:
 > # Why insist on type-preservation?
 > # Why even make these methods?
 > my_string.transform(fn) == fn(my_string)
 > my_bytes.transform(fn) == fn(my_bytes)
Make them methods if they are "like" codecs, by which I mean something
like (more or less) invertible stream-oriented transformations. Eg,
 my_bytes.gzip()
Pretty weak, though.
 > Transcoding: ByteString -> ByteString
 > # If you care how it is represented, it is no longer unicode;
 > # it is a specific (ByteString) representation
 > mybytes.recode(old_encoding=?, new_encoding)
 > 
 > # Can the old encoding often be inferred?
 > # Or should it always be written because of EIBTI?
(1) I agree this is the obvious connotation of "transcode" in the
 codec context.
(2) This usage is too special to deserve treatment at this level,
 especially since for most purposes
 my_bytes.decode(old_encoding).encode(new_encoding)
 will be perfectly sufficient.
(3) old_encoding should not be inferred as part of .decode() or
 .recode(), as such inference is unreliable and domain-specific
 heuristics often lead to great improvements. A separate
 method/function should be used.