PostgreSQL 7.4 and later use the exact same regular expression engine that was developed by Henry Spencer for Tcl 8.2. This means that PostgreSQL supports the same three regular expressions flavors: Tcl Advanced Regular Expressions, POSIX Extended Regular Expressions, and POSIX Basic Regular Expressions. Just like in Tcl, AREs are the default. You can set PostgreSQL’s regex_flavor run-time parameter if you want to change default flavor.

The regular expressions tutorial and regular expressions reference on this website do not mention PostgreSQL specifically. Everything they say about Tcl’s regex flavor also applies to PostgreSQL. This is not true for the replacement text flavor. PostgreSQL’s regexp_replace function uses a slightly different replacement text syntax than Tcl’s regsub command. This is explained below.

The Tilde Operator

The tilde infix operator returns true or false depending on whether a regular expression can match part of a string, or not. For example, 'subject' ~ 'regexp' returns false, while 'subject' ~ '\\w' returns true. If the regex must match the whole string then you’ll need to use anchors. So 'subject' ~ '^\\w$' returns false, while 'subject' ~ '^\\w+$' returns true. There are 4 variations of this operator:

~ attempts a case sensitive match.
~* attempts a case insensitive match.
!~ attempts a case sensitive match, and returns true if the regex does not match any part of the subject string.
!~* attempts a case insensitive match, and returns true if the regex does not match any part of the subject string.

While only case sensitivity can be toggled by the operator, all other options can be set using mode modifiers at the start of the regular expression. Mode modifiers override the operator type. (?c)regex is always case sensitive, regardless of the operator it is used with.

The most common use of this operator is to select rows based on whether a column matches a regular expression, e.g.:

select * from mytable where mycolumn ~* 'regexp'

Regular Expressions as Literal PostgreSQL Strings

The backslash is used to escape characters in PostgreSQL strings. So a regular expression like \w that contains a backslash becomes '\\w' when written as a literal string in a PostgreSQL statement. To match a single literal backslash, you’ll need the regex \\ which becomes '\\\\' in PostgreSQL.

PostgreSQL Regexp Functions

With the substring(string from pattern) function, you can extract part of a string or column. It takes two parameters: the string you want to extract the text from, and the pattern the extracted text should match. If there is no match, substring() returns null. E.g. substring('subject' from 'regexp') returns null. If there is a match, and the regex has one or more capturing groups, then the text matched by the first capturing group is returned. E.g. substring('subject' from 's(\\w)') returns 'u'. If there is a match, but the regex has no capturing groups, the whole regex match is returned. E.g. substring('subject' from 's\\w') returns 'su'. If the regex matches the string more than once, only the first match is returned. Since the substring() function doesn’t take a "flags" parameter, you’ll need to toggle any matching options using mode modifiers at the start of the regex.

This function is particularly useful to extract information from columns. For example, to extract the first number from the column mycolumn for each row, use:

select substring(mycolumn from '\d+') from mytable

The regexp_match(string, pattern[, flags]) function is very similar to the substring function. But instead of returning a string, it returns an array. If the regex has one or more capturing groups then the array contains the text captured of by each capturing group. If the regex has no capturing groups then it returns an array with the overall regex match as the only element. You have the option to pass a string with mode modifier letters supported by Tcl as flags.

The regexp_matches(string, pattern[, flags]) function does exactly the same as the regexp_match function, unless you pass the 'g' flag. Then it returns as many rows as the regex finds matches in the string. Each row is then an array with either the text captured by all the capturing groups, or the overall regex match if there are no capturing groups.

With regexp_replace(subject, pattern, replacement [, flags]) you can replace regex matches in a string. If you omit the flags parameter then only the first match is replaced. The 'g' flag (for "global") causes all regex matches in the string to be replaced. You can add all the mode modifier letters supported by Tcl to the flags argument. So 'gix' replaces all matches case insensitively and ignores whitespace in the regex.

You can use the backreferences 1円 through 9円 in the replacement text to re-insert the text matched by a capturing group into the regular expression. \& re-inserts the whole regex match. This is different from Tcl, which uses & without the backslash and 0円 to insert the whole regex match. You can escape a backslash with another backslash. A backslash not followed by a digit between 1 and 9, an ampersand, or another backslash is a literal.

E.g. regexp_replace('subject', '(\w)\w', '\&1円', 'g') returns 'susbjbecet'.

PostgreSQL provides two functions to split a string using a regular expression. regexp_split_to_table(subject, pattern[, flags]) returns the split string as a new table. regexp_split_to_array(subject, pattern[, flags]) returns the split string as an array of text. Both functions split strings in the same way. The return format is the only difference. The returned table or array consists of the text before the first regex match, the text between any regex match and the following regex match, and the text after the last regex match. If regex matches are adjacent to the start or end of the text or to each other then the table or array will have empty rows or elements. If the regex finds no matches then the result is one row or element with the original subject string. Capturing groups in the regex do not affect the result. These functions always split on all matches. Attempting to pass the flag 'g' is an error.

New Regexp Functions in PostgreSQL 15

The regexp_like(string, pattern[, flags]) has the same result as the title operator if no flags are specified. It has the same result as the ~* operator if only the flag 'i' is specified. The main benefit of this function over the operator is that you can specify other flags too.

regexp_substr(subject, pattern[, position[, occurrence[, flags[, subexpr]]]]) returns a string with the part of subject matched by the regular expression. If the match attempt fails then NULL is returned. The position parameter specifies the character position in the source string at which the match attempt should start. The first character has position 1. The occurrence parameter specifies which match to get. Set it to 1 to get the first match. If you specify a higher number then PostgreSQL will continue to attempt to match the regex starting at the end of the previous match, until it found as many matches as you specified. The last match is then returned. If there are fewer matches then NULL is returned. The flags parameter lets you pass mode modifier letters outside the regex. The subexpr parameter lets you specify the number of a capturing group in the regex. The function then returns the start or end of the text captured by that group within the regex match specified by occurrence. The last 4 parameters are optional.

regexp_instr(subject, pattern[, position[, occurrence[, endoption[, flags[, subexpr]]]]]) returns the beginning or ending position of a regex match in the source string. This function takes the same parameters as regexp_substr, plus one more. Set return_option to zero or omit the parameter to get the position of the first character in match. Set it to one to get the position of the first character after the match. The first character in the string has position 1. regexp_instr returns zero if the match cannot be found. The last 5 parameters are optional.

regexp_count(subject, pattern[, position[, flags]]) returns the number of times the regex can be matched in the source string starting from position if specified, or from the start of the string if not. It returns zero if the regex finds no matches at all.