This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2013年10月21日 12:01 by serhiy.storchaka, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| re_mk_bitmap.patch | serhiy.storchaka, 2013年10月21日 12:01 | review | ||
| re_optimize_charset.patch | serhiy.storchaka, 2013年10月24日 19:24 | review | ||
| re_optimize_charset_2.patch | serhiy.storchaka, 2013年10月25日 21:02 | review | ||
| Messages (6) | |||
|---|---|---|---|
| msg200755 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年10月21日 12:01 | |
Here is a patch which speed up compiling of regular expressions with big charsets. Microbenchmark: $ ./python -m timeit "from sre_compile import compile; r = '[%s]' % ''.join(map(chr, range(256, 2**16, 255)))" "compile(r, 0)" Unpatched (but with fixed issue19327): 119 msec per loop Patched: 59.6 msec per loop Compiling regular expressions with big charset was main cause of slowing down importing the email.message module (issue11454). |
|||
| msg201166 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年10月24日 19:24 | |
Here is a more complex patch which optimizes charset compiling. It affects small charsets too. Big charsets now supports same optimizations as small charsets. Optimized bitmap now can be used even if the charset contains category items or non-bmp characters. $ ./python -m timeit "from sre_compile import compile; r = '[0-9]+'" "compile(r, 0)" Unpatched: 1000 loops, best of 3: 457 usec per loop Patched: 1000 loops, best of 3: 368 usec per loop $ ./python -m timeit "from sre_compile import compile; r = '[ \t\n\r\v\f]+'" "compile(r, 0)" Unpatched: 1000 loops, best of 3: 490 usec per loop Patched: 1000 loops, best of 3: 413 usec per loop $ ./python -m timeit "from sre_compile import compile; r = '[0-9A-Za-z_]+'" "compile(r, 0)" Unpatched: 1000 loops, best of 3: 760 usec per loop Patched: 1000 loops, best of 3: 527 usec per loop $ ./python -m timeit "from sre_compile import compile; r = r'[^\ud800-\udfff]*'" "compile(r, 0)" Unpatched: 100 loops, best of 3: 2.07 msec per loop Patched: 1000 loops, best of 3: 1.44 msec per loop $ ./python -m timeit "from sre_compile import compile; r = '[\u0410-\u042f\u0430-\u043f\u0404\u0406\u0407\u0454\u0456\u0457\u0490\u0491]+'" "compile(r, 0)" Unpatched: 100 loops, best of 3: 8.24 msec per loop Patched: 100 loops, best of 3: 2.13 msec per loop $ ./python -m timeit "from sre_compile import compile; r = '[%s]' % ''.join(map(chr, range(256, 2**16, 255)))" "compile(r, 0)" Unpatched: 10 loops, best of 3: 119 msec per loop Patched: 10 loops, best of 3: 24.1 msec per loop |
|||
| msg201292 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年10月25日 21:02 | |
Updated patch addresses Antoine's comments. One my bug fixed. |
|||
| msg201419 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年10月27日 06:22 | |
New changeset d5498d9d9bb0 by Serhiy Storchaka in branch 'default': Issue #19329: Optimized compiling charsets in regular expressions. http://hg.python.org/cpython/rev/d5498d9d9bb0 |
|||
| msg201420 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年10月27日 06:24 | |
Thank you Antoine for your review. |
|||
| msg230335 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2014年10月31日 11:55 | |
New changeset ebd48b4f650d by Serhiy Storchaka in branch '2.7': Backported the optimization of compiling charsets in regular expressions https://hg.python.org/cpython/rev/ebd48b4f650d |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:52 | admin | set | github: 63528 |
| 2014年10月31日 11:55:20 | python-dev | set | messages: + msg230335 |
| 2013年10月27日 06:24:34 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages: + msg201420 stage: patch review -> resolved |
| 2013年10月27日 06:22:02 | python-dev | set | nosy:
+ python-dev messages: + msg201419 |
| 2013年10月25日 21:02:01 | serhiy.storchaka | set | files:
+ re_optimize_charset_2.patch messages: + msg201292 |
| 2013年10月24日 19:24:58 | serhiy.storchaka | set | files:
+ re_optimize_charset.patch messages: + msg201166 title: Faster compiling of big charset regexpes -> Faster compiling of charset regexpes |
| 2013年10月21日 12:01:44 | serhiy.storchaka | set | dependencies: + re doesn't work with big charsets |
| 2013年10月21日 12:01:18 | serhiy.storchaka | create | |