Message 97535 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	Rhamphoryncus, amaury.forgeotdarc, bupjae, ezio.melotti, lemburg, vstinner
Date	2010年01月10日.18:14:49
SpamBayes Score	5.551115e-17
Marked as misclassified	No
Message-id	<4B4A1914.3050308@egenix.com>
In-reply-to	<1262997921.1.0.403720271527.issue5127@psf.upfronthosting.co.za>

Content
Amaury Forgeot d'Arc wrote: > > Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment: > >> I don't see the point in changing the various conversion APIs in the >> unicode database to return Py_UCS4 when there are no conversions that >> map code points between BMP and non-BMP. > > For consistency: if Py_UNICODE_ISPRINTABLE is changed to take Py_UCS4, Py_UNICODE_TOLOWER should also take Py_UCS4, and must return the same type. > >> In order to solve the problem in question (unicode_repr() failing), >> we should change the various property checking APIs to accept Py_UCS4 >> input data. This needlessly increases the type database size without >> real benefit. > [I'm not sure to understand. For me the 'real benefit' is that it solves the problem in question.] The problem in question is already solved by just changing the property checking APIs. Changing the conversion APIs fixes a non-problem, since there are no mappings that would require Py_UCS4 on a UCS2 build. > Yes this increases the type database: there are 300 more "case" statements in _PyUnicode_ToNumeric(), and the PyUnicode_TypeRecords array needs 1068 more bytes. > On Windows, VS9.0 release build, unicodectype.obj grows from 86Kb to 94Kb; python32.dll is exactly 1.5Kb larger (from 2219Kb to 2221.5Kb); > the memory usage of the just-started interpreter is about 32K larger (around 5M). These look reasonable figures to me. > >> For that to work properly we'll have to either make sure that >> extensions get recompiled if they use these changed APIs, or we >> provide an additional set of UCS2 APIs that extend the Py_UNICODE >> input value to a Py_UCS4 value before calling the underlying Py_UCS4 >> API. > > Extensions that use these changed APIs need to be recompiled, or they won't load: existing modules link with symbols like _PyUnicodeUCS2_IsPrintable, when the future interpreter will define _PyUnicode_IsPrintable. Hmm, that's a good point. OK, you got me convinced: let's go for it then.

Content

Amaury Forgeot d'Arc wrote:
> 
> Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:
> 
>> I don't see the point in changing the various conversion APIs in the
>> unicode database to return Py_UCS4 when there are no conversions that
>> map code points between BMP and non-BMP.
> 
> For consistency: if Py_UNICODE_ISPRINTABLE is changed to take Py_UCS4, Py_UNICODE_TOLOWER should also take Py_UCS4, and must return the same type.
> 
>> In order to solve the problem in question (unicode_repr() failing), 
>> we should change the various property checking APIs to accept Py_UCS4
>> input data. This needlessly increases the type database size without
>> real benefit.
> [I'm not sure to understand. For me the 'real benefit' is that it solves the problem in question.]
The problem in question is already solved by just changing the property
checking APIs. Changing the conversion APIs fixes a non-problem, since there
are no mappings that would require Py_UCS4 on a UCS2 build.
> Yes this increases the type database: there are 300 more "case" statements in _PyUnicode_ToNumeric(), and the PyUnicode_TypeRecords array needs 1068 more bytes.
> On Windows, VS9.0 release build, unicodectype.obj grows from 86Kb to 94Kb; python32.dll is exactly 1.5Kb larger (from 2219Kb to 2221.5Kb);
> the memory usage of the just-started interpreter is about 32K larger (around 5M). These look reasonable figures to me.
> 
>> For that to work properly we'll have to either make sure that
>> extensions get recompiled if they use these changed APIs, or we
>> provide an additional set of UCS2 APIs that extend the Py_UNICODE
>> input value to a Py_UCS4 value before calling the underlying Py_UCS4
>> API.
> 
> Extensions that use these changed APIs need to be recompiled, or they won't load: existing modules link with symbols like _PyUnicodeUCS2_IsPrintable, when the future interpreter will define _PyUnicode_IsPrintable.
Hmm, that's a good point.
OK, you got me convinced: let's go for it then.

History
Date	User	Action	Args
2010年01月10日 18:14:52	lemburg	set	recipients: + lemburg, amaury.forgeotdarc, Rhamphoryncus, vstinner, ezio.melotti, bupjae
2010年01月10日 18:14:50	lemburg	link	issue5127 messages
2010年01月10日 18:14:49	lemburg	create

homepage