C++ Regular Expressions with Boost

Boost is a free source code library for C++. After downloading and unzipping, you need to run the bootstrap batch file or script and then run b2 --with-regex to compile Boost’s regex library. Then add the folder into which you unzipped Boost to the include path of your C++ compiler. Add the stage\lib subfolder of that folder to your linker’s library path. Then you can add #include <boost/regex.hpp> to your C++ code to make use of Boost regular expressions.

If you use C++Builder then you should download the Boost libraries for your specific version of C++Builder from Embarcadero. The best way to do this is through the GetIt package manager. The version of Boost you get depends on your version of C++Builder and exactly which C++ toolchain you’re using. The classic Win32 compiler is forever stuck on Boost 1.39. This was the only Win32 compiler in XE3 to XE8. It’s the one you use in C++Builder 10 and later if you select "Use ‘classic’ Borland compiler" in the project options for the Windows 32-bit platform. C++Builder XE3 added Win64 support using a Clang-based toolchain. It uses Boost 1.50 in C++Builder XE3 through XE6 and Boost 1.55 in XE7 through 10.2. The new Clang-based Win32 compiler in C++Builder 10 and later uses the same version of Boost as the Win64 compiler. In 10.3 they use Boost 1.68 and in 10.4 and later (including C++Builder 12) they use Boost 1.70. C++Builder 11 introduced a new "Windows 64-bit Modern" or "Win64x" platform. This platform uses Boost 1.85 in C++Builder 11 and 12.

This website covers Boost 1.38, 1.39, and 1.42 through the latest 1.89. Boost 1.40 introduced many new regex features borrowed from Perl 5.10. But it also introduced some serious bugs that weren’t fixed until Boost 1.42. So we completely ignore Boost 1.40 and 1.41. We still cover Boost 1.38 and 1.39 (which have identical regex features) because the "classic Borland" Win32 C++Builder compiler is stuck on this version. If you’re using another compiler then you should definitely use Boost 1.42 or later to avoid what are now old bugs. You should preferably use Boost 1.47 or later as this version changes certain behaviors involving backreferences that may change how some of your regexes behave if you later upgrade from pre-1.47 to post-1.47.

In practice, you’ll mostly use the Boost’s ECMAScript grammar. It’s the default grammar and offers far more features that the other grammars. Whenever the tutorial on this website mentions Boost without mentioning any grammars then what is written applies to the ECMAScript grammar and may or may not apply to any of the other grammars. You’ll really only use the other grammars if you want to reuse existing regular expressions from old POSIX code or UNIX scripts.

Boost And Regex Standards

The Boost documentation likes to talk about being compatible with Perl and JavaScript and how boost::regex was standardized as std::regex in C++11. Visual C++ and C++ Builder include the Dinkumware implementation of std::regex. C++Builder 11 and 12 use the libc++ implementation if you select the "Windows 64-bit Modern" or "Win64x" platform. If we compare these two implementations and Boost then we find that the class and function templates are almost the same. Your C++ compiler will just as happily compile code using boost::regex as it does compiling the same code using std::regex. So all the code examples given in the std::regex topic on this website work just fine with Boost if you replace std with boost.

But when you run your C++ application then it can make a big difference whether it is Dinkumware, libc++, or Boost that is interpreting your regular expressions. Though they offer the same six grammars, their syntax and behavior are not the same between the two libraries. Boost defines regex_constants::perl which is not part of the C++11 standard. This is not actually an additional grammar but simply a synonym to ECMAScript and JavaScript. There are major differences in the regex flavors used by actual JavaScript and actual Perl. So it’s obvious that a library treating these as one flavor or grammar can’t be compatible with either. Boost’s ECMAScript grammar is a cross between the actual JavaScript and Perl flavors, with a bunch of Boost-specific features and peculiarities thrown in. Dinkumware’s and libc++’s ECMAScript grammar is closer to actual JavaScript, but still have significant behavioral differences, including a few key differences from each other. They didn’t borrow any features from Perl that JavaScript doesn’t have.

The table below highlights the most important differences between the ECMAScript grammars in the Dinkumware and libc++ implementations of std::regex, boost::regex, and actual JavaScript and Perl. Some are obvious differences in feature sets. But others are subtle differences in behavior that may bite you unexpectedly.

Feature Dinkumware libc++ boost::regex JavaScript Perl
Dot matches line breaks never never default never option
Anchors match at line breaks always never default option option
Line break characters CR, LF CR, LF, LS, PS CR, LF, FF, NEL, LS, PSCR, LF, LS, PS LF
Backreferences to non-participating groupsMatch empty stringfail fail since 1.47Match empty string fail
Backreferences to parent group Match empty string fail fail since 1.78Match empty string Match previous iteration
Empty character class error fail Not possible fail Not possible
Free-spacing mode no no YES no YES
Mode modifiers no no YES groups YES
Possessive quantifiers no no YES YES YES
Named capture no no .NET angle & quote.NET anglePython and .NET angle & quote
Recursion no no atomic no backtracking
Subroutines no no backtracking no backtracking
Conditionals no no YES no YES
Atomic groups no no YES no YES
Atomic groups backtrack capturing groupsn/a n/a no n/a YES
Start and end of word boundaries no no YES no no
Standard POSIX classes YES YES YES no YES
Single letter POSIX classes no no YES no no
Unicode categories no no no with /xYES
Unicode scripts no no no with /xYES
Unicode binary properties no no no with /xYES
Unicode blocks no no no no YES
Unicode property sets no no no no YES
Feature Dinkumware libc++ boost::regex JavaScript Perl

| Quick Start | Tutorial | Search & Replace | Tools & Languages | Examples | Reference |

| grep | PowerGREP | RegexBuddy | RegexMagic |

| EditPad Lite | EditPad Pro | Google Docs | Google Sheets | LibreOffice | Notepad++ |

| Boost | C# | Delphi | F# | GNU (Linux) | Groovy | ICU (Unicode) | Java | JavaScript | .NET | PCRE (C/C++) | PCRE2 (C/C++) | Perl | PHP | POSIX | PowerShell | Python | Python.NET and IronPython | R | RE2 | Ruby | std::regex | Tcl | TypeScript | VBScript | Visual Basic 6 | Visual Basic (.NET) | wxWidgets | XML Schema | XQuery & XPath | Xojo | XRegExp |

| Google BigQuery | MySQL | Oracle | PostgreSQL |

AltStyle によって変換されたページ (->オリジナル) /