homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author tchrist
Recipients tchrist
Date 2011年08月11日.19:18:30
SpamBayes Score 1.4691809e-06
Marked as misclassified No
Message-id <1313090311.62.0.0473644856742.issue12731@psf.upfronthosting.co.za>
In-reply-to
Content
You cannot use Python's lib re for handling Unicode regular expressions because it violates the standard set out for the same in UTS#18 on Unicode Regular Expressions in RL1.2a on compatibility properties. What \w is allowed to match is clearly explained there, but Python has its own idea. Because it is in clear violation of the standard, it is misleading and wrong for Python to claim that the re.UNICODE flag makes \w and friends match Unicode. Here are the failed test cases when the attached file is run under v3.2; there are further failures when run under v2.7.
FAIL lib re found non alphanumeric string café
FAIL lib re found non alphanumeric string K
FAIL lib re found non alphanumeric string ͅ
FAIL lib re found non alphanumeric string ְ
FAIL lib re found non alphanumeric string 0
FAIL lib re found non alphanumeric string 𐍁
FAIL lib re found non alphanumeric string Unicode
FAIL lib re found non alphanumeric string 𐐔𐐯𐑅𐐨𐑉𐐯𐐻
FAIL lib re found non alphanumeric string connector‿punctuation
FAIL lib re found non alphanumeric string Ὰͅ_Στο_Διάολο
FAIL lib re found non alphanumeric string 𐌰𐍄𐍄𐌰‿𐌿𐌽𐍃𐌰𐍂‿𐌸𐌿‿𐌹𐌽‿𐌷𐌹𐌼𐌹𐌽𐌰𐌼
FAIL lib re found all alphanumeric string 123
FAIL lib re found all alphanumeric string 123
FAIL lib re found all alphanumeric string 1⁄41⁄23⁄4
FAIL lib re found all alphanumeric string (3)
Note that Matthew Barnett's regex lib for Python handles all of these cases in comformance with The Unicode Standard.
History
Date User Action Args
2011年08月11日 19:18:31tchristsetrecipients: + tchrist
2011年08月11日 19:18:31tchristsetmessageid: <1313090311.62.0.0473644856742.issue12731@psf.upfronthosting.co.za>
2011年08月11日 19:18:31tchristlinkissue12731 messages
2011年08月11日 19:18:30tchristcreate

AltStyle によって変換されたページ (->オリジナル) /