-
Notifications
You must be signed in to change notification settings - Fork 1.7k
WIP: Add RE2 regexp engine support #2012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
324b72c
to
6d72abd
Compare
I've just force-pushed updated PR.
- New commit with fixed error handling for the rx operator.
get_pattern
was renamed togetPattern
to adhere to naming convention used elsewhere in ModSecurity.- verifyCC conversion to the new classes, which was done very sloppily earlier, has been re-done properly.
6d72abd
to
c1aa2f4
Compare
1f5e168
to
26f08cb
Compare
c1aa2f4
to
d55b588
Compare
b9ded66
to
d55b588
Compare
Since my changes have been essentially merged into v3/dev/re2
, I updated the PR to target v3/master
branch now.
Overall, the performance for the libpcre users was not impacted by this pull request:
The chart was generated using the code merged into v3/dev/re2.
d2f3df2
to
b5cf105
Compare
Ping.
Just in case, even though it's marked WIP, it's more than ready for the first round of review.
moves Pcre to backends/pcre.cc moves PegexMatch to regex_match.h
This enables unit tests to compare the matching groups as well, not just binary match-no match.
Previously, searchAll would stop search when it encountered an empty matching group in any position. This means that, for example, regular expression "(a)(b?)(c)" would match string "ac", but the resulting group list would be ["ac", "a"]. After this change, the resulting list for the aforementioned regular expression becomes ["ac", "a", "", "c"] like it should've been. Additionally, this also changes behaviour for multiple matches. For example, when "aaa00bbb" is matched by "[a-z]*", previously only "aaa" would be returned. Now the matching list is ["aaa", "", "", "bbb", ""]. The old behaviour was confusing and almost certainly a bug. The new behaviour is the same as in Python's re.findall. For reference, though, Go does it somewhat differently: empty matches at the end of non-empty matches are ignored, so in Go above example is ["aaa", "", "bbb"] instead.
RE2 doesn't support certain features, like negative lookaround, so when a regular expression cannot be compiled with RE2, it's compiled with libpcre instead. This has some runtime cost, as this fallback is implemented with an extra heap object and virtual function calls. When RE2 is not enabled, however, everything works as it did before.
Ubuntu 14.04 doesn't have RE2 package altogether, and Ubuntu 16.04 RE2 package is too old. Ubuntu 18.04 RE2 package might work, but this Ubuntu verison it's not supported by Travis yet. So build RE2 from source.
b5cf105
to
95f16ea
Compare
Rebased on top of the current master.
eddie4
commented
Sep 28, 2020
Very interesting commit. We recently ran into performance issues with modsecurity. Any news on how much improvement we might see?
Very interesting commit. We recently ran into performance issues with modsecurity. Any news on how much improvement we might see?
Hi @eddie4,
That is not the single front that we have on the performance side. Check the branch v3/dev/3.1-experimental. We have had great progress there. This is still on the Queue, to be merged once other pieces of v3.1 are complete.
Very interesting commit. We recently ran into performance issues with modsecurity. Any news on how much improvement we might see?
@eddie4 in my experience, it doesn't really make it much faster in average, however, it eliminates ReDoS worst cases.
@zimmerle please ping me when it's time to rebase this on top v3/dev/3.1-experimental
.
mathetake
commented
Apr 28, 2025
hi - sorry to ping on this really old patch but I wonder by any chance we could see this happening in the future ...
Hi @mathetake,
thanks for bringing this up.
As you can see GH status, this PR is very old therefore there are many conflicts which needs to resolve.
I think the best would be that we add a new PR based on this one, but the base would be the current status.
I try to handle this task soon - or feel free to send a new PR.
mathetake
commented
May 2, 2025
cool, thank you for the response @airween - i won't be able to work on this personally anytime soon, so feel free to take it over. I would love to test it out, review etc
Uh oh!
There was an error while loading. Please reload this page.
See #1996.
Every commit has detailed description.
I've implemented the fallback approach so far: if RE2 support is enabled, every regular expression is compiled with RE2, and if that fails, regular expression is compiled with libpcre instead. Please see #1996 for more detailed discussion why this approach is necessary.
There's also lots of refactoring of regexp related code.
Open questions:
--with-re
is supplied?