[Python-Dev] [Python-3000] Betas today - I hope

Fri Jun 13 15:26:47 CEST 2008

M.-A. Lemburg wrote:
> On 2008年06月13日 11:32, Walter Dörwald wrote:
>> M.-A. Lemburg wrote:
>>> On 2008年06月12日 16:59, Walter Dörwald wrote:
>>>> M.-A. Lemburg wrote:
>>>>> .transform() and .untransform() use the codecs to apply same-type
>>>>> conversions. They do apply type checks to make sure that the
>>>>> codec does indeed return the same type.
>>>>>>>>>> E.g. text.transform('xml-escape') or data.transform('base64').
>>>>>>>> So what would a base64 codec do with the errors argument?
>>>>>> It could use it to e.g. try to recover as much data as possible
>>> from broken input data.
>>>>>> Currently (in Py2.x), it raises an exception if you pass in anything
>>> but "strict".
>>>>>>>>> I think for transformations we don't need the full codec machinery:
>>>>> > ...
>>>>>>>>>> No need to invent another wheel :-) The codecs already exist for
>>>>> Py2.x and can be used by the .encode()/.decode() methods in Py2.x
>>>>> (where no type checks occur).
>>>>>>>> By using a new API we could get rid of old warts. For example: Why 
>>>> does the stateless encoder/decoder return how many input 
>>>> characters/bytes it has consumed? It must consume *all* bytes anyway!
>>>>>> No, it doesn't and that's the point in having those return values :-)
>>>>>> Even though the encoder/decoders are stateless, that doesn't mean
>>> they have to consume all input data. The caller is responsible to
>>> make sure that all input data was in fact consumed.
>>>>>> You could for example have a decoder that stops decoding after
>>> having seen a block end indicator, e.g. a base64 line end or
>>> XML closing element.
>>>> So how should the UTF-8 decoder know that it has to stop at a closing 
>> XML element?
>> The UTF-8 decoder doesn't support this, but you could write a codec
> that applies this kind of detection, e.g. to not try to decode
> partial UTF-8 byte sequences at the end of input, which would then
> result in error.
>>>> Just because all codecs that ship with Python always try to decode
>>> the complete input doesn't mean that the feature isn't being used.
>>>> I know of no other code that does. Do you have an example for this use.
>> I already gave you a few examples.

Maybe I was unclear, I meant real world examples, not hypothetical ones.
>>> The interface was designed to allow for the above situations.
>>>> Then could we at least have a new codec method that does:
>>>> def statelesencode(self, input):
>> (output, consumed) = self.encode(input)
>> assert len(input) == consumed
>> return output
>> You mean as method to the Codec class ?

No, I meant as a method for the CodecInfo clas.
> Sure, we could do that, but please use a different name,
> e.g. .encodeall() and .decodeall() - .encode() and .decode()
> are already stateles (and so would the new methods be), so
> "stateless" isn't all that meaningful in this context.

I like the names encodeall/decodeall!
> We could also add such a check to the PyCodec_Encode() and _Decode()
> functions. They currently do not apply the above check.
>> In Python, those two functions are exposed as codecs.encode()
> and codecs.decode().

This change will probably have to wait for the 2.7 cycle.
Servus,
 Walter