More regex stuff by me:
• Awesome Regex
List of the best regex resources
• Regex+
JS regexes + future
“Regular Expressions Cookbook manages to be simultaneously accessible and almost ridiculously comprehensive.”
—Jeff Atwood
If you want, you can download XRegExp bundled with all addons as xregexp-all.js . Alternatively, you can download the individual addon scripts from GitHub. XRegExp's npm package uses xregexp-all.js.
The Unicode Base script adds base support for Unicode matching via the \p{…} syntax. À la carte token addon packages add support for Unicode categories, scripts, and other properties. All Unicode tokens can be inverted using \P{…} or \p{^…}. Token names are case insensitive, and any spaces, hyphens, and underscores are ignored. You can omit the braces for token names that are a single letter.
// Categories
XRegExp('\\p{Sc}\\pN+'); // Sc = currency symbol, N = number
// Can also use the full names \p{Currency_Symbol} and \p{Number}
// Scripts
XRegExp('\\p{Cyrillic}');
XRegExp('[\\p{Latin}\\p{Common}]');
// Can also use the Script= prefix to match ES2018: \p{Script=Cyrillic}
// Properties
XRegExp('\\p{ASCII}');
XRegExp('\\p{Assigned}');
// In action...
const unicodeWord = XRegExp("^\\pL+$"); // L = letter
unicodeWord.test("Русский"); // true
unicodeWord.test("日本語"); // true
unicodeWord.test("العربية"); // true
XRegExp("^\\p{Katakana}+$").test("カタカナ"); // true
By default, \p{…} and \P{…} support the Basic Multilingual Plane (i.e. code points up to U+FFFF). You can opt-in to full 21-bit Unicode support (with code points up to U+10FFFF) on a per-regex basis by using flag A. In XRegExp, this is called astral mode. You can automatically add flag A for all new regexes by running XRegExp.install('astral'). When in astral mode, \p{…} and \P{…} always match a full code point rather than a code unit, using surrogate pairs for code points above U+FFFF.
// Using flag A to match astral code points
XRegExp('^\\pS$').test('π©'); // -> false
XRegExp('^\\pS$', 'A').test('π©'); // -> true
// Using surrogate pair U+D83D U+DCA9 to represent U+1F4A9 (pile of poo)
XRegExp('^\\pS$', 'A').test('\uD83D\uDCA9'); // -> true
// Implicit flag A
XRegExp.install('astral');
XRegExp('^\\pS$').test('π©'); // -> true
Opting in to astral mode disables the use of \p{…} and \P{…} within character classes. In astral mode, use e.g. (\pL|[0-9_])+ instead of [\pL0-9_]+.
See API: XRegExp.matchRecursive .
See API: XRegExp.build .