This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010年11月27日 20:29 by belopolsky, last changed 2022年04月11日 14:57 by admin.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| issue10552.diff | belopolsky, 2010年11月27日 21:15 | review | ||
| issue10552a.diff | belopolsky, 2010年11月29日 18:36 | review | ||
| 10552-remove-apple-files.txt | akuchling, 2013年11月10日 18:24 | Remove problematic mapping files before parsing | ||
| 10552-remove-apple-files-v2.txt | martin.panter, 2015年01月13日 05:57 | review | ||
| Messages (15) | |||
|---|---|---|---|
| msg122549 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年11月27日 20:29 | |
$ ../../python.exe gencodec.py MAPPINGS/VENDORS/MISC/ build/ converting APL-ISO-IR-68.TXT to build/apl_iso_ir_68.py and build/apl_iso_ir_68.mapping converting ATARIST.TXT to build/atarist.py and build/atarist.mapping converting CP1006.TXT to build/cp1006.py and build/cp1006.mapping converting CP424.TXT to build/cp424.py and build/cp424.mapping Traceback (most recent call last): File "gencodec.py", line 421, in <module> convertdir(*sys.argv[1:]) File "gencodec.py", line 391, in convertdir pymap(mappathname, map, dirprefix + codefile,name,comments) File "gencodec.py", line 355, in pymap code = codegen(name,map,encodingname,comments) File "gencodec.py", line 268, in codegen precisions=(4, 2)) File "gencodec.py", line 152, in python_mapdef_code mappings = sorted(map.items()) TypeError: unorderable types: NoneType() < int() It does appear to have been updated for 3.x: $ python2.7 gencodec.py MAPPINGS/VENDORS/MISC/ build/ Traceback (most recent call last): File "gencodec.py", line 35, in <module> UNI_UNDEFINED = chr(0xFFFE) ValueError: chr() arg not in range(256) |
|||
| msg122559 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年11月27日 21:15 | |
Attached patch addresses the issue by using -1 instead of None for missing codes. Comparison of generated encoding files to those in Lib/encodings shows only whitespace changes except one which appears to be a change on the unicode.org side: diff -b build/koi8_u.py ../../Lib/encodings/koi8_u.py 1c1 < """ Python Character Mapping Codec koi8_u generated from 'MAPPINGS/VENDORS/MISC/KOI8-U.TXT' with gencodec.py. --- > """ Python Character Mapping Codec koi8_u generated from 'python-mappings/KOI8-U.TXT' with gencodec.py. 221c221 < '\u0491' # 0xAD -> CYRILLIC SMALL LETTER GHE WITH UPTURN --- > '\u0491' # 0xAD -> CYRILLIC SMALL LETTER UKRAINIAN GHE WITH UPTURN 237c237 < '\u0490' # 0xBD -> CYRILLIC CAPITAL LETTER GHE WITH UPTURN --- > '\u0490' # 0xBD -> CYRILLIC CAPITAL LETTER UKRAINIAN GHE WITH UPTURN 308d307 < |
|||
| msg122565 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2010年11月27日 22:09 | |
Alexander Belopolsky wrote: > > Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment: > > Attached patch addresses the issue by using -1 instead of None for missing codes. Comparison of generated encoding files to those in Lib/encodings shows only whitespace changes except one which appears to be a change on the unicode.org side: Please use a global constant instead of the literal -1, e.g. MISSING_CODE. Thanks. > diff -b build/koi8_u.py ../../Lib/encodings/koi8_u.py > 1c1 > < """ Python Character Mapping Codec koi8_u generated from 'MAPPINGS/VENDORS/MISC/KOI8-U.TXT' with gencodec.py. > --- >> """ Python Character Mapping Codec koi8_u generated from 'python-mappings/KOI8-U.TXT' with gencodec.py. > 221c221 > < '\u0491' # 0xAD -> CYRILLIC SMALL LETTER GHE WITH UPTURN > --- >> '\u0491' # 0xAD -> CYRILLIC SMALL LETTER UKRAINIAN GHE WITH UPTURN > 237c237 > < '\u0490' # 0xBD -> CYRILLIC CAPITAL LETTER GHE WITH UPTURN > --- >> '\u0490' # 0xBD -> CYRILLIC CAPITAL LETTER UKRAINIAN GHE WITH UPTURN > 308d307 > < That's just a comment and doesn't change the semantics of the codec. |
|||
| msg122585 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年11月27日 23:02 | |
Attached patch uses MISSING_CODE as Mark suggested. There are still errors apparently because parsecodes() may return either an int or a tuple. I think only mac encodings are affected, so I would like to commit the current patch before tackling this issue. $ ../../python.exe gencodec.py MAPPINGS/VENDORS/APPLE/ build/ mac_ converting ARABIC.TXT to build/mac_arabic.py and build/mac_arabic.mapping converting CELTIC.TXT to build/mac_celtic.py and build/mac_celtic.mapping converting CENTEURO.TXT to build/mac_centeuro.py and build/mac_centeuro.mapping converting CHINSIMP.TXT to build/mac_chinsimp.py and build/mac_chinsimp.mapping Traceback (most recent call last): File "gencodec.py", line 424, in <module> convertdir(*sys.argv[1:]) File "gencodec.py", line 394, in convertdir pymap(mappathname, map, dirprefix + codefile,name,comments) File "gencodec.py", line 358, in pymap code = codegen(name,map,encodingname,comments) File "gencodec.py", line 271, in codegen precisions=(4, 2)) File "gencodec.py", line 155, in python_mapdef_code mappings = sorted(map.items()) TypeError: unorderable types: tuple() < int() |
|||
| msg122586 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年11月27日 23:03 | |
Please ignore Makefile changes in the patch. |
|||
| msg122829 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年11月29日 16:57 | |
Martin, I believe you were the last to update the unicode database. (See r85371.) Did you use python2.x to generate it or you have your own private copy of these tools? I noticed that genwincodecs.bat refers to c:\python26\python in 2.7 branch and c:\python30\python in py3k. Could this be an indication that these tools are out of date? What is the plan for maintaining these tools? Should fixes be done in 2.7 and 3.x be generated by 2to3? Or should fixes go to py3k and backported to 2.7 when they don't add new features? |
|||
| msg122837 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2010年11月29日 18:21 | |
gencodec.py is only rarely used, namely when adding new codecs based on Unicode mapping files. It is not run regularly on the files from ftp.unicode.org and only updated on demand. AFAIK, it was last used on Python2 and never on Python3, hence the errors you find with it. BTW: You appear to have a comma appended to the constant, that doesn't belong there: +# Placeholder for a missing codepoint +MISSING_CODE = -1, + Perhaps that's causing the second error you are seeing. |
|||
| msg122842 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年11月29日 18:36 | |
On Mon, Nov 29, 2010 at 1:21 PM, Marc-Andre Lemburg <report@bugs.python.org> wrote: .. > BTW: You appear to have a comma appended to the constant, that doesn't > belong there: > > +# Placeholder for a missing codepoint > +MISSING_CODE = -1, > + > > Perhaps that's causing the second error you are seeing. No, that comma was a left-over from the attempt to fix the mac_chinsimp error. The trace that I reported was generated with MISSING_CODE = -1. I am replacing the patch. Is it ok to commit a partial fix? It may take longer to fix the mac error. |
|||
| msg122843 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2010年11月29日 18:37 | |
Alexander Belopolsky wrote: > > Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment: > > On Mon, Nov 29, 2010 at 1:21 PM, Marc-Andre Lemburg > <report@bugs.python.org> wrote: > .. >> BTW: You appear to have a comma appended to the constant, that doesn't >> belong there: >> >> +# Placeholder for a missing codepoint >> +MISSING_CODE = -1, >> + >> >> Perhaps that's causing the second error you are seeing. > > No, that comma was a left-over from the attempt to fix the > mac_chinsimp error. The trace that I reported was generated with > MISSING_CODE = -1. I am replacing the patch. > > Is it ok to commit a partial fix? It may take longer to fix the mac error. Sure, we won't need that script anytime soon and if we do, we can just as well use the Python2 version. |
|||
| msg122850 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年11月29日 18:52 | |
On Mon, Nov 29, 2010 at 1:38 PM, Marc-Andre Lemburg <report@bugs.python.org> wrote: .. > Sure, we won't need that script anytime soon and if we do, we > can just as well use the Python2 version. That may not be true. I compared 2.7 and py3k versions and the later has some new features: * unidata_version changed from 5.2.0 to 6.0.0 * Unihan data is read from zip file * added processing of DerivedCoreProperties These changes don't affect gencodec.py, but it may be inconvenient to run makeunicodedata.py and gencodec.py using different versions of Python. I'll check that all non-mac encodings are correctly generated before committing. |
|||
| msg122858 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2010年11月29日 19:48 | |
> These changes don't affect gencodec.py, but it may be inconvenient to > run makeunicodedata.py and gencodec.py using different versions of > Python. As MAL explains: these are completely unrelated, independent tools, and gencodec isn't run more than once per decade (or so). I only ever run makeunicodedata, and I have been using Python 3 to run it. The mappings are not supposed to ever change once produced. In particular, new versions of Unicode cannot affect them, since the existing characters all map fine to existing code points, which will not change their meaning per Unicode stability criteria. |
|||
| msg122916 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年11月30日 16:57 | |
Committed in revision 86891. Keeping open to address Mac issue. |
|||
| msg202543 - (view) | Author: A.M. Kuchling (akuchling) * (Python committer) | Date: 2013年11月10日 18:24 | |
For the Mac issue, we could just delete the mapping files before processing them. I've attached a patch that modifies the Makefile. |
|||
| msg233902 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2015年01月13日 05:57 | |
Here is a new version of Kuchling’s patch. I restored some mapping files which do not give any errors (including the mac_turkish codec, which is actually documented), and removed both readme files. |
|||
| msg406955 - (view) | Author: Irit Katriel (iritkatriel) * (Python committer) | Date: 2021年11月24日 19:59 | |
I don't think Martin's patch has been applied. Is it needed? |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:09 | admin | set | github: 54761 |
| 2021年11月24日 21:35:33 | vstinner | set | nosy:
- vstinner |
| 2021年11月24日 19:59:51 | iritkatriel | set | nosy:
+ iritkatriel messages: + msg406955 |
| 2015年01月13日 05:57:30 | martin.panter | set | files:
+ 10552-remove-apple-files-v2.txt versions: + Python 3.4 nosy: + martin.panter, vstinner messages: + msg233902 components: + Unicode |
| 2014年12月31日 16:22:37 | akuchling | set | nosy:
- akuchling |
| 2014年06月29日 23:08:51 | belopolsky | set | nosy:
+ ronaldoussoren, ned.deily, hynek |
| 2014年06月29日 23:07:44 | belopolsky | set | assignee: belopolsky -> |
| 2013年11月10日 18:24:50 | akuchling | set | files:
+ 10552-remove-apple-files.txt nosy: + akuchling messages: + msg202543 |
| 2010年12月30日 22:14:16 | georg.brandl | unlink | issue7962 dependencies |
| 2010年11月30日 16:57:48 | belopolsky | set | nosy:
lemburg, loewis, belopolsky, ezio.melotti messages: + msg122916 priority: normal -> low assignee: belopolsky components: + macOS stage: commit review -> needs patch |
| 2010年11月29日 20:22:31 | belopolsky | unlink | issue10575 dependencies |
| 2010年11月29日 19:48:38 | loewis | set | messages: + msg122858 |
| 2010年11月29日 18:52:32 | belopolsky | set | messages: + msg122850 |
| 2010年11月29日 18:37:58 | lemburg | set | messages: + msg122843 |
| 2010年11月29日 18:36:58 | belopolsky | set | files: - issue10552a.diff |
| 2010年11月29日 18:36:46 | belopolsky | set | files:
+ issue10552a.diff messages: + msg122842 |
| 2010年11月29日 18:21:55 | lemburg | set | messages: + msg122837 |
| 2010年11月29日 16:57:45 | belopolsky | set | messages: + msg122829 |
| 2010年11月29日 16:45:33 | belopolsky | link | issue10575 dependencies |
| 2010年11月27日 23:03:04 | belopolsky | set | messages: + msg122586 |
| 2010年11月27日 23:02:25 | belopolsky | set | files:
+ issue10552a.diff messages: + msg122585 stage: commit review |
| 2010年11月27日 22:16:02 | ezio.melotti | set | nosy:
+ ezio.melotti |
| 2010年11月27日 22:09:48 | lemburg | set | messages: + msg122565 |
| 2010年11月27日 21:15:09 | belopolsky | set | files:
+ issue10552.diff nosy: + loewis messages: + msg122559 keywords: + patch |
| 2010年11月27日 20:31:17 | belopolsky | link | issue7962 dependencies |
| 2010年11月27日 20:29:09 | belopolsky | create | |