RegEx engine module that builds NFA/DFA and uses it for matching
Applications, Games, Tools, User libs and useful stuff coded in PureBasic
RegEx engine module that builds NFA/DFA and uses it for matching
RegEx engine module for PureBasic
When matching, the engine always chooses the longest match among several possible matches. No backtracking is required during this process, since all alternatives are checked simultaneously. So for a regular expression "0123x|0123y" and the string "0123y", after the failure at "x", the engine does not start again from the beginning at "0", but is behind "3" and the engine only has to read "y". In contrast to an NFA, a DFA does not try anything out, i.e. the DFA reads "y" directly.
It is also possible to pass multiple regular expressions to the engine and set unique ID numbers for them to be able to determine which regular expression matched in case of a match. This functionality makes it easy to create lexers. This is also the real motivation that led to this project. However, the engine is kept flexible in its usage and can easily be used for other purposes as well, which is why I don't consider it exclusively a lexer engine and therefore didn't name it that way.
You can find some code examples, the listing of the supported syntax for the regular expressions, further information and the module itself on GitHub.
Currently, the project is in beta phase, so I would be very grateful for feedback from you.
Features that are currently planned for upcoming versions: Click
These features will not be included in the current beta phase of version 1.0.0, but only in the next minor versions, e.g. 1.1.0, 1.2.0 etc.
Note that this project does not intend to provide a RegEx engine to replace more feature-rich engines like the PCRE natively integrated in PureBasic. So there will be no support for capturing groups, backreferences, word boundaries and anchors.
- GitHub project page: Click
When matching, the engine always chooses the longest match among several possible matches. No backtracking is required during this process, since all alternatives are checked simultaneously. So for a regular expression "0123x|0123y" and the string "0123y", after the failure at "x", the engine does not start again from the beginning at "0", but is behind "3" and the engine only has to read "y". In contrast to an NFA, a DFA does not try anything out, i.e. the DFA reads "y" directly.
It is also possible to pass multiple regular expressions to the engine and set unique ID numbers for them to be able to determine which regular expression matched in case of a match. This functionality makes it easy to create lexers. This is also the real motivation that led to this project. However, the engine is kept flexible in its usage and can easily be used for other purposes as well, which is why I don't consider it exclusively a lexer engine and therefore didn't name it that way.
You can find some code examples, the listing of the supported syntax for the regular expressions, further information and the module itself on GitHub.
Currently, the project is in beta phase, so I would be very grateful for feedback from you.
Features that are currently planned for upcoming versions: Click
These features will not be included in the current beta phase of version 1.0.0, but only in the next minor versions, e.g. 1.1.0, 1.2.0 etc.
Note that this project does not intend to provide a RegEx engine to replace more feature-rich engines like the PCRE natively integrated in PureBasic. So there will be no support for capturing groups, backreferences, word boundaries and anchors.
Last edited by Sicro on Sun Aug 24, 2025 12:54 pm, edited 4 times in total.
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Re: RegEx engine module that builds NFA/DFA and uses it for matching
1.0.0-beta.2 released
Download: Click
Improved
Download: Click
Improved
- Unicode case unfolding table rewritten, resulting in 44% smaller executables.
- Character classes now produce much more compact NFAs, resulting in faster NFA matching, faster DFA creation, and less memory consumption.
- The more compact NFAs also benefit the DFAs, as they are now much smaller as well.
- Escaping opening square brackets within character classes was not possible.
Last edited by Sicro on Sat Dec 31, 2022 3:37 pm, edited 3 times in total.
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
- Kwai chang caine
- Always Here
Always Here - Posts: 5505
- Joined: Sun Nov 05, 2006 11:42 pm
- Location: Lyon - France
Re: RegEx engine module that builds NFA/DFA and uses it for matching
Post by Kwai chang caine »
Hello SICRO
First thanks for sharing this nice code 8)
Unfortunately, like surely numerous persons i don't understand one word of the REGEX :oops:
But the worst it's that interesting me when even :mrgreen:
I profit from your splendid works to ask to you a question, if you allow me :wink:
Your work can he translate REGEX in something understandable by the first fool to come ? (like me for exampler :mrgreen: )
And if not, know you a very very simple freeware, for babies, who can create a REGEX without learn this complex rules ? :oops:
Congratulations again for your work 8)
First thanks for sharing this nice code 8)
Unfortunately, like surely numerous persons i don't understand one word of the REGEX :oops:
But the worst it's that interesting me when even :mrgreen:
I profit from your splendid works to ask to you a question, if you allow me :wink:
Your work can he translate REGEX in something understandable by the first fool to come ? (like me for exampler :mrgreen: )
And if not, know you a very very simple freeware, for babies, who can create a REGEX without learn this complex rules ? :oops:
Congratulations again for your work 8)
ImageThe happiness is a road...
Not a destination
Not a destination
Re: RegEx engine module that builds NFA/DFA and uses it for matching
[引用]
[引用]
If you want to discuss further about RegEx in general, better create a new thread in this forum.
[引用]Thanks!
Kwai chang caine wrote: Thu Oct 27, 2022 4:36 pm
Your work can he translate REGEX in something understandable by the first fool to come ? (like me for exampler :mrgreen: )
If you have trouble imagining how a RegEx works, you can use a RegEx visualizer or a tool that explains the RegEx.[引用]
Kwai chang caine wrote: Thu Oct 27, 2022 4:36 pm
And if not, know you a very very simple freeware, for babies, who can create a REGEX without learn this complex rules ? :oops:
You don't need a tool for that. If you experiment a bit with RegEx, you will quickly understand how it works. Start simple, like "a|b" and experiment further.If you want to discuss further about RegEx in general, better create a new thread in this forum.
[引用]Thanks!
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
- Kwai chang caine
- Always Here
Always Here - Posts: 5505
- Joined: Sun Nov 05, 2006 11:42 pm
- Location: Lyon - France
Re: RegEx engine module that builds NFA/DFA and uses it for matching
Post by Kwai chang caine »
Thanks a lot for your answer 8)
ImageThe happiness is a road...
Not a destination
Not a destination
Re: RegEx engine module that builds NFA/DFA and uses it for matching
1.0.0-beta.2 has been re-released.
The SimpleCaseUnfolding table was a bit buggy. I didn't want to release a new beta version number for that.
Beta 3 is already in work and will probably be the last beta version.
The SimpleCaseUnfolding table was a bit buggy. I didn't want to release a new beta version number for that.
Beta 3 is already in work and will probably be the last beta version.
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Re: RegEx engine module that builds NFA/DFA and uses it for matching
Finale 1.0.0 released
After some testing and improvements, I have now decided to release the final version 1.0.0
Download: Click
New
After some testing and improvements, I have now decided to release the final version 1.0.0
Download: Click
New
- Added ASCII mode, which restricts the predefined character classes to ASCII characters only (encoding is still UCS-2) and restricts case-insensitive mode to uppercase and lowercase letters only, instead of applying Unicode case-folding.
- Developer tools added.
- More error handling added.
- Processing of RegExes like (a*)* or (a*)+ triggered an infinite loop.
- Parsing of RegEx modes was fixed.
- Memory leaks have been fixed.
Last edited by Sicro on Mon Dec 23, 2024 11:36 am, edited 1 time in total.
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Re: RegEx engine module that builds NFA/DFA and uses it for matching
1.1.0-beta.1 released
Download: Click
New
Download: Click
New
- Single-byte mode: creates much smaller NFAs/DFAs by only supporting characters that do not need the second byte in the UCS-2 encoding (\x01 up to \xFF).
- Code example that visualizes the NFAs/DFAs as Graphviz diagrams.
- It is checked whether the parameter regExId in AddNfa() has a valid value.
- Matching with an NFA and creating a DFA is faster. The extent of the speed increase depends on how many alternative paths have to be followed during NFA processing (e.g. a large number of paths are followed by the very complex RegEx character class \w).
- Outside of character classes, identical characters were unnecessarily added twice to the NFA when both case-insensitive mode and ASCII mode were enabled and the character has no upper and lower case.
Last edited by Sicro on Mon Dec 23, 2024 11:39 am, edited 2 times in total.
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
- loulou2522
- Enthusiast
Enthusiast - Posts: 553
- Joined: Tue Oct 14, 2014 12:09 pm
Re: RegEx engine module that builds NFA/DFA and uses it for matching
Post by loulou2522 »
Thanks to explain what is NFAs/DFAs
Happy Christmas
Happy Christmas
Re: RegEx engine module that builds NFA/DFA and uses it for matching
You're welcome. :)
If you are interested in the NFA and DFA, I recommend that you experiment with the new code example that creates diagrams using the Graphviz tool. The diagrams are interesting to look at.
I wish you a Merry Christmas too.
If you are interested in the NFA and DFA, I recommend that you experiment with the new code example that creates diagrams using the Graphviz tool. The diagrams are interesting to look at.
I wish you a Merry Christmas too.
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Re: RegEx engine module that builds NFA/DFA and uses it for matching
1.1.0-beta.2 released
Download: Click
Fixed
Download: Click
Fixed
- Due to the implementation of single-byte mode, case-sensitive mode was not processed correctly with user-defined character classes (negated or not). This was also the case when single-byte mode was not activated.
- User-defined negated character classes were not processed correctly with single-byte mode activated.
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Re: RegEx engine module that builds NFA/DFA and uses it for matching
Final 1.1.0 released
Download: Click
In the next version, I will implement the feature of being able to create a minimal DFA from an existing DFA. Such a DFA contains only the number of states that are really required, which makes the DFA smaller and consumes less memory. However, the number of transitions that the DFA has to perform in a minimal form does not change, but maybe it is still faster because it fits better into the CPU cache, etc. It will be interesting. :)
Download: Click
In the next version, I will implement the feature of being able to create a minimal DFA from an existing DFA. Such a DFA contains only the number of states that are really required, which makes the DFA smaller and consumes less memory. However, the number of transitions that the DFA has to perform in a minimal form does not change, but maybe it is still faster because it fits better into the CPU cache, etc. It will be interesting. :)
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Re: RegEx engine module that builds NFA/DFA and uses it for matching
1.2.0 released
Download: Click
New
Download: Click
New
- New public constant #ModuleVersion$ for determining the module version.
- Speed increase implemented in version 1.1.0 has been reversed because, in the case of multiple AddNfa() calls whose RegEx could match the same string, the regExId of the last AddNfa() call was not always taken.
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
- skinkairewalker
- Addict
Addict - Posts: 800
- Joined: Fri Dec 04, 2015 9:26 pm
Re: RegEx engine module that builds NFA/DFA and uses it for matching
Post by skinkairewalker »
Can the RegEx engine help me indent a code file by removing multiple spaces between tokens to make parsing easier?
For example:
Is it possible to indent or detect multiple spaces? Is there any effective algorithm for that?
For example:
Code: Select all
Print(" IGNORE SPACES INSIDE STRINGS ");
Print( 1 * 2)
Private MyVar = "12345"
Public testando= 1234567
Re: RegEx engine module that builds NFA/DFA and uses it for matching
[引用]
[引用]
skinkairewalker wrote: Tue Nov 04, 2025 8:04 pm
You define what you want to ignore as tokens as well and skip these tokens when they are found until the RegEx::Match() function returns tokens that you want. In our discussion via PM, I already wrote a code with a function NextToken(skip_Whitespace_Newline_Comment = #True) for you to skip optionally whitespace, comments or line breaks.Code: Select all
Print( 1 * 2)
Private MyVar = "12345"
Public testando= 1234567[引用]
skinkairewalker wrote: Tue Nov 04, 2025 8:04 pm
Here you can process the string token afterwards with PB string functions or with string pointers. An alternative would be to switch to a second lexer when encountering a quote, which then processes special tokens that you use to compose the string token, but skip unwanted tokens.Code: Select all
Print(" IGNORE SPACES INSIDE STRINGS ");Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Return to "Applications - Feedback and Discussion"
Jump to
- PureBasic
- ↳ Coding Questions
- ↳ Game Programming
- ↳ 3D Programming
- ↳ Assembly and C Programming in PureBasic
- ↳ The PureBasic Editor
- ↳ The PureBasic Form Designer
- ↳ General Discussion
- ↳ Feature Requests and Wishlists
- ↳ Tricks 'n' Tips
- Bug Reports
- ↳ Bugs - Windows
- ↳ Bugs - Linux
- ↳ Bugs - Mac OSX
- ↳ Bugs - C backend
- ↳ Bugs - 3D Engine
- ↳ Bugs - IDE
- ↳ Bugs - Documentation
- OS Specific
- ↳ AmigaOS
- ↳ Linux
- ↳ Windows
- ↳ Mac OSX
- ↳ Raspberry PI
- Miscellaneous
- ↳ Announcement
- ↳ Off Topic
- Showcase
- ↳ Applications - Feedback and Discussion
- ↳ PureFORM & JaPBe
- ↳ TailBite