Message 97447 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	amaury.forgeotdarc
Recipients	Rhamphoryncus, amaury.forgeotdarc, bupjae, ezio.melotti, lemburg, vstinner
Date	2010年01月09日.00:45:18
SpamBayes Score	1.0824674e-14
Marked as misclassified	No
Message-id	<1262997921.1.0.403720271527.issue5127@psf.upfronthosting.co.za>

Content
> I don't see the point in changing the various conversion APIs in the > unicode database to return Py_UCS4 when there are no conversions that > map code points between BMP and non-BMP. For consistency: if Py_UNICODE_ISPRINTABLE is changed to take Py_UCS4, Py_UNICODE_TOLOWER should also take Py_UCS4, and must return the same type. > In order to solve the problem in question (unicode_repr() failing), > we should change the various property checking APIs to accept Py_UCS4 > input data. This needlessly increases the type database size without > real benefit. [I'm not sure to understand. For me the 'real benefit' is that it solves the problem in question.] Yes this increases the type database: there are 300 more "case" statements in _PyUnicode_ToNumeric(), and the PyUnicode_TypeRecords array needs 1068 more bytes. On Windows, VS9.0 release build, unicodectype.obj grows from 86Kb to 94Kb; python32.dll is exactly 1.5Kb larger (from 2219Kb to 2221.5Kb); the memory usage of the just-started interpreter is about 32K larger (around 5M). These look reasonable figures to me. > For that to work properly we'll have to either make sure that > extensions get recompiled if they use these changed APIs, or we > provide an additional set of UCS2 APIs that extend the Py_UNICODE > input value to a Py_UCS4 value before calling the underlying Py_UCS4 > API. Extensions that use these changed APIs need to be recompiled, or they won't load: existing modules link with symbols like _PyUnicodeUCS2_IsPrintable, when the future interpreter will define _PyUnicode_IsPrintable.

Content

> I don't see the point in changing the various conversion APIs in the
> unicode database to return Py_UCS4 when there are no conversions that
> map code points between BMP and non-BMP.
For consistency: if Py_UNICODE_ISPRINTABLE is changed to take Py_UCS4, Py_UNICODE_TOLOWER should also take Py_UCS4, and must return the same type.
> In order to solve the problem in question (unicode_repr() failing), 
> we should change the various property checking APIs to accept Py_UCS4
> input data. This needlessly increases the type database size without
> real benefit.
[I'm not sure to understand. For me the 'real benefit' is that it solves the problem in question.]
Yes this increases the type database: there are 300 more "case" statements in _PyUnicode_ToNumeric(), and the PyUnicode_TypeRecords array needs 1068 more bytes.
On Windows, VS9.0 release build, unicodectype.obj grows from 86Kb to 94Kb; python32.dll is exactly 1.5Kb larger (from 2219Kb to 2221.5Kb);
the memory usage of the just-started interpreter is about 32K larger (around 5M). These look reasonable figures to me.
> For that to work properly we'll have to either make sure that
> extensions get recompiled if they use these changed APIs, or we
> provide an additional set of UCS2 APIs that extend the Py_UNICODE
> input value to a Py_UCS4 value before calling the underlying Py_UCS4
> API.
Extensions that use these changed APIs need to be recompiled, or they won't load: existing modules link with symbols like _PyUnicodeUCS2_IsPrintable, when the future interpreter will define _PyUnicode_IsPrintable.

History
Date	User	Action	Args
2010年01月09日 00:45:21	amaury.forgeotdarc	set	recipients: + amaury.forgeotdarc, lemburg, Rhamphoryncus, vstinner, ezio.melotti, bupjae
2010年01月09日 00:45:21	amaury.forgeotdarc	set	messageid: <1262997921.1.0.403720271527.issue5127@psf.upfronthosting.co.za>
2010年01月09日 00:45:19	amaury.forgeotdarc	link	issue5127 messages
2010年01月09日 00:45:18	amaury.forgeotdarc	create

homepage