Message93265
| Author |
christoph |
| Recipients |
christoph, ezio.melotti, ggenellina, lemburg |
| Date |
2009年09月29日.09:20:11 |
| SpamBayes Score |
1.416359e-09 |
| Marked as misclassified |
No |
| Message-id |
<1254216013.11.0.953096626201.issue6412@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
> * U+0027 APOSTROPHE
hardcoded (see below)
> * U+00AD SOFT HYPHEN (SHY)
has the "Format (Cf)" property and thus is included automatically
> * U+2019 RIGHT SINGLE QUOTATION MARK
hardcoded (see below)
I hardcoded some characters into Tools/unicode/makeunicodedata.py:
>>> print ' '.join([u':', u'\xb7', u'\u0387', u'\u05f4', u'\u2027',
u'\ufe13', u'\ufe55', u'\uff1a'] + [u"'", u'.', u'\u2018', u'\u2019',
u'\u2024', u'\ufe52', u'\uff07', u'\uff0e'])
: · · ״ ‧ : : : ' . ‘ ’ . . ' .
Those cannot currently be extracted automatically, as neither
DerivedCoreProperties.txt nor the source file for property
"Word_Break(C) = MidLetter or MidNumLet" are provided in the script.
As I said, the patch is only a second best solution, as the correct
path would be implementing the word breaking algorithm as described in
the newest standard. This patch is just an improvement over the current
situation. |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2009年09月29日 09:20:13 | christoph | set | recipients:
+ christoph, lemburg, ggenellina, ezio.melotti |
| 2009年09月29日 09:20:13 | christoph | set | messageid: <1254216013.11.0.953096626201.issue6412@psf.upfronthosting.co.za> |
| 2009年09月29日 09:20:11 | christoph | link | issue6412 messages |
| 2009年09月29日 09:20:11 | christoph | create |
|