homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients belopolsky, eric.smith, ezio.melotti, lemburg, mark.dickinson, skrah, vstinner
Date 2010年11月29日.09:41:33
SpamBayes Score 5.551115e-16
Marked as misclassified No
Message-id <4CF3754C.4080705@egenix.com>
In-reply-to <1291011789.32.0.244648538312.issue10557@psf.upfronthosting.co.za>
Content
Alexander Belopolsky wrote:
> 
> Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:
> 
> After a bit of svn archeology, it does appear that Arabic-Indic digits' support was deliberate at least in the sense that the feature was tested for when the code was first committed. See r15000.
As I mentioned on python-dev (http://mail.python.org/pipermail/python-dev/2010-November/106077.html)
this support was added intentionally.
> The test migrated from file to file over the last 10 years, but it is still present in test_float.py:
> 
> self.assertEqual(float(b" \u0663.\u0661\u0664 ".decode('raw-unicode-escape')), 3.14)
> 
> (It should probably be now rewritten using a string literal.)
> 
> I am now attaching the patch (issue10557.diff) that fixes the bug without sacrificing non-ASCII digit support.
> If this approach is well-received, I would like to replace all calls to PyUnicode_EncodeDecimal() with the calls to the new _PyUnicode_EncodeDecimalUTF8() and deprecate Latin-1-oriented PyUnicode_EncodeDecimal().
It would be better to copy and iterate over the Unicode string first,
replacing any decimal code points with ASCII ones and then call the
UTF-8 encoder.
The code as it stands is very inefficient, since it will most likely
run the memcpy() part for every code point after the first non-ASCII
decimal one.
> For the future, I note that starting with Unicode 6.0.0, the Unicode Consortium promises that
> 
> """
> Characters with the property value Numeric_Type=de (Decimal) only occur in contiguous ranges of 10 characters, with ascending numeric values from 0 to 9 (Numeric_Value=0..9).
> """
> 
> This makes it very easy to check a numeric string does not contain a mix of digits from different scripts.
I'm not sure why you'd want to check for such ranges.
> I still believe that proper API should require explicit choice of language or locale before allowing digits other than 0-9 just as int() would not accept hexadecimal digits without explicit choice of base >= 16. But this would be a subject of a feature request.
Since when do we require a locale or language to be specified when
using Unicode ?
The codecs, Unicode methods and other Unicode support features
happily work with all kinds of languages, mixed or not, without any
such specification.
History
Date User Action Args
2010年11月29日 09:41:37lemburgsetrecipients: + lemburg, mark.dickinson, belopolsky, vstinner, eric.smith, ezio.melotti, skrah
2010年11月29日 09:41:33lemburglinkissue10557 messages
2010年11月29日 09:41:33lemburgcreate

AltStyle によって変換されたページ (->オリジナル) /