-
Notifications
You must be signed in to change notification settings - Fork 1.1k
unix-ffi: re: convert to PCRE2 #737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4ed294e to
4c2dccb
Compare
(this is needed as we are deprecating pcre in openwrt and we are converting any user to pcre2)
Thanks for the contribution.
What's the difference between libpcre2-posix, libpcre2-8, libpcre2-16 and libpcre2-32?
It would be possible to support both the old PCRE and PCRE2, but I don't think it's worth the effort.
The POSIX variant just provide wrapper to have POSIX interface for pcre2 API.
Pcre had the same variant and wasn't used here.
The -16 and -32 variant are used to support 16bit and 32bit string (UTF-16 and UTF-32)
UTF-8 should be enough and what is commonly used when migrating from pcre to pcre2
Supporting both versions would bloat the code with lots of ifdef and IMHO it's not worth it and pcre won't receive any update and would pose security risks.
The -16 and -32 variant are used to support 16bit and 32bit string (UTF-16 and UTF-32)
UTF-8 should be enough and what is commonly used when migrating from pcre to pcre2
Aah, I see, thanks for the info. Indeed we should use the -8 variant.
I see that the test_re.py provided in this re directory doesn't pass anymore. It needs to pass.
@dpgeorge i remember it did pass correctly. Maybe i'm running it wrong? my test was compiling the lib and run the test using the compiled library. Something wrong in this?
I'm simply running the following in the unix-ffi/re directory with this PR checked out:
$ micropython
MicroPython v1.20.0-557-g5aec051f9 on 2023年10月05日; linux [GCC 13.2.1] version
Use Ctrl-D to exit, Ctrl-E for paste mode
>>> import re
>>> re.search(r"a+", "caaab").group(0)
''
That's not correct output.
@dpgeorge mhh i'm confused
root@OpenWrt:~# micropython
MicroPython v1.20.0 on 2023年10月04日; linux [GCC 12.3.0] version
Use Ctrl-D to exit, Ctrl-E for paste mode
>>> import re
>>> re.search(r"a+", "caaab").group(0)
'aaa'
Also by putting the uncompiled re.py in /usr/lib/micropython/unix produce the same output
4c2dccb to
772bf91
Compare
@dpgeorge I finally manage to repro the problem... It seems to be a problem with PCRE2_SIZE that is size_t and this change between 32bit and 64bit. Using ULONG handled both systems. (my test were done on a 32bit system)
bbfc3f8 to
c35e738
Compare
unix-ffi/re/re.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this call to sizeof to get the size be done at global module level, so it only has to be done once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better now?
Thanks, I can confirm that it now works on my (64-bit) machine.
54e92a9 to
1cbe8c4
Compare
BKPepe
commented
Oct 16, 2023
So, @dpgeorge it can be merged, right? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM, thanks @Ansuel!
PCRE is marked as EOL and won't receive any new security update. Convert the re module to PCRE2 API to enforce security. Additional dependency is now needed with uctypes due to changes in how PCRE2 return the match_data in a pointer and require special handling. The converted module is tested with the test_re.py with no regression. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
1cbe8c4 to
d8e163b
Compare
Thanks for updating, this looks good now.
Merged.
PCRE is marked as EOL and won't receive any new security update.
Convert the re module to PCRE2 API to enforce security. Additional dependency is now needed with uctypes due to changes in how PCRE2 return the match_data in a pointer and require special handling.
The converted module is tested with the test_re.py with no regression.