[Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs

Wed Jun 9 10:41:29 CEST 2010

Victor Stinner wrote:
> There are two opposite issues in the bug tracker:
>> #7475: codecs missing: base64 bz2 hex zlib ...
> -> reintroduce the codecs removed from Python3
>> #8838: Remove codecs.readbuffer_encode()
> -> remove the last part of the removed codecs
>> If I understood correctly, the question is: should codecs module only contain 
> encoding codecs, or contain also other kind of codecs.

Sorry, but I can only repeat what I've already mentioned
a few times on the tracker items: this is a misunderstanding.
The codec system does not mandate a specific type combination
(and that's per design). Only the helper methods .encode() and
.decode() on bytes and str objects in Python3 do in order to
provide type safety.
> Encoding codec API is now strict (encode: str->bytes, decode: bytes->str), 
> it's not possible to reuse str.encode() or bytes.decode() for the other 
> codecs. Marc-Andre Lemburg proposed to add .tranform() and .untranform() 
> methods to str, bytes and bytearray types. If I understood correctly, it would 
> look like:
>> >>> b'abc'.transform("hex")
> '616263'
> >>> '616263'.untranform("hex")
> b'abc'

No, .transform() and .untransform() will be interface to same-type
codecs, i.e. ones that convert bytes to bytes or str to str. As with
.encode()/.decode() these helper methods also implement type safety
of the return type.
The above example will read:
 >>> b'abc'.transform("hex")
 b'616263'
 >>> b'616263'.untranform("hex")
 b'abc'
> I suppose that each codec will have a different list of accepted input and 
> output types. Example:
>> bz2: encode:bytes->bytes, decode:bytes->bytes
> rot13: encode:str->str, decode:str->str
> hex: encode:bytes->str, decode: str->bytes

hex will do bytes->bytes in both directions, just like it does
in Python2.
The methods to be used will be .transform() for the encode direction
and .untransform() for the decode direction.
> And so "abc".encode("bz2") would raise a TypeError.

Yes.
> --
>> In my opinion, we should not mix codecs of different kinds (compression, 
> cipher, etc.) because the input and output types are different. It would have 
> more sense to create a standard API for each kind of codec. Existing examples 
> of standard APIs in Python: hashlib, shutil.make_archive(), database API, etc.

If you want, you can have those as well, but then you'd
have to introduce new APIs or modules, whereas the codec
interface have existed for quite a while in Python2 and
are in regular use.
For most applications the very simple to use codec interface
to these codecs is all that is needed, so I don't see a strong
case for adding new interfaces, e.g.
hex_data = data.transform('hex')
looks clean and neat.
-- 
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Jun 09 2010)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
2010年07月19日: EuroPython 2010, Birmingham, UK 39 days to go
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

 eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
 Registered at Amtsgericht Duesseldorf: HRB 46611
 http://www.egenix.com/company/contact/