Message141916
| Author |
tchrist |
| Recipients |
tchrist |
| Date |
2011年08月11日.18:48:19 |
| SpamBayes Score |
1.2323476e-14 |
| Marked as misclassified |
No |
| Message-id |
<1313088501.39.0.822875623158.issue12728@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
The Python re library is broken in its approach to case-insensitive matches. It erroneously attempts to compare lowercase mappings. This is wrong. You must compare the Unicode casefolds, not the Unicode casemaps. Otherwise you get wrong answers. I include a small test case that illustrates this bug. The bug exists on both 2.7 and 3.2, and on both wide builds and narrow builds. For comparison, I also show results using Matthew Barnett's regex library, which gets all 5 tests correct where re gets all 5 tests wrong.
A sample run is:
FAIL: re pattern Ι is not the same as string ͅ
PASS: regex pattern Ι is indeed the same as string ͅ
FAIL: re pattern Μ is not the same as string μ
PASS: regex pattern Μ is indeed the same as string μ
FAIL: re pattern s is not the same as string s
PASS: regex pattern s is indeed the same as string s
FAIL: re pattern ΣΤΙΓΜΑΣ is not the same as string στιγμας
PASS: regex pattern ΣΤΙΓΜΑΣ is indeed the same as string στιγμας
FAIL: re pattern POST is not the same as string post
PASS: regex pattern POST is indeed the same as string post
re lib passed 0 of 5 tests
regex lib passed 5 of 5 tests |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2011年08月11日 18:48:21 | tchrist | set | recipients:
+ tchrist |
| 2011年08月11日 18:48:21 | tchrist | set | messageid: <1313088501.39.0.822875623158.issue12728@psf.upfronthosting.co.za> |
| 2011年08月11日 18:48:20 | tchrist | link | issue12728 messages |
| 2011年08月11日 18:48:20 | tchrist | create |
|