Outside a character class, a backslash followed by a digit greater than 0 (and possibly further digits) is a back reference to a capturing subpattern earlier (i.e. to its left) in the pattern, provided there have been that many previous capturing left parentheses.
However, if the decimal number following the backslash is less than 10, it is always taken as a back reference, and causes an error only if there are not that many capturing left parentheses in the entire pattern. In other words, the parentheses that are referenced need not be to the left of the reference for numbers less than 10. A "forward back reference" can make sense when a repetition is involved and the subpattern to the right has participated in an earlier iteration. See the section escape sequences for further details of the handling of digits following a backslash.
A back reference matches whatever actually matched the capturing
subpattern in the current subject string, rather than
anything matching the subpattern itself. So the pattern
(sens|respons)e and 1円ibility
matches "sense and sensibility" and "response and responsibility",
but not "sense and responsibility". If case-sensitive (caseful)
matching is in force at the time of the back reference, then
the case of letters is relevant. For example,
((?i)rah)\s+1円
matches "rah rah" and "RAH RAH", but not "RAH rah", even
though the original capturing subpattern is matched
case-insensitively (caselessly).
There may be more than one back reference to the same subpattern.
If a subpattern has not actually been used in a
particular match, then any back references to it always
fail. For example, the pattern
(a|(bc))2円
always fails if it starts to match "a" rather than "bc".
Because there may be up to 99 back references, all digits
following the backslash are taken as part of a potential
back reference number. If the pattern continues with a digit
character, then some delimiter must be used to terminate the
back reference. If the PCRE_EXTENDED option
is set, this can be whitespace. Otherwise an empty comment can be used.
A back reference that occurs inside the parentheses to which
it refers fails when the subpattern is first used, so, for
example, (a1円) never matches. However, such references can
be useful inside repeated subpatterns. For example, the pattern
(a|b1円)+
matches any number of "a"s and also "aba", "ababba" etc. At
each iteration of the subpattern, the back reference matches
the character string corresponding to the previous iteration.
In order for this to work, the pattern must be such
that the first iteration does not need to match the back
reference. This can be done using alternation, as in the
example above, or by a quantifier with a minimum of zero.
The \g
escape sequence can be
used for absolute and relative referencing of subpatterns.
This escape sequence must be followed by an unsigned number or a negative
number, optionally enclosed in braces. The sequences 1円
,
\g1
and \g{1}
are synonymous
with one another. The use of this pattern with an unsigned number can
help remove the ambiguity inherent when using digits following a
backslash. The sequence helps to distinguish back references from octal
characters and also makes it easier to have a back reference followed
by a literal number, e.g. \g{2}1
.
The use of the \g
sequence with a negative number
signifies a relative reference. For example, (foo)(bar)\g{-1}
would match the sequence "foobarbar" and (foo)(bar)\g{-2}
matches "foobarfoo". This can be useful in long patterns as an alternative
to keeping track of the number of subpatterns in order to reference
a specific previous subpattern.
Back references to the named subpatterns can be achieved by
(?P=name)
,
\k<name>
, \k'name'
,
\k{name}
, \g{name}
,
\g<name>
or \g'name'
.
Something similar opportunity is DEFINE.
Example:
(?(DEFINE)(?<myname>\bvery\b))(?&myname)\p{Pd}(?&myname).
Expression above will match "very-very" from next sentence:
Define is very-very handy sometimes.
^-------^
How it works. (?(DEFINE)(?<myname>\bvery\b)) - this block defines "myname" equal to "\bvery\b". So, this block "(?&myname)\p{Pd}(?&myname)" equvivalent to "\bvery\b\p{Pd}\bvery\b".
The escape sequence \g used as a backreference may not always behave as expected.
The following numbered backreferences refer to the text matching the specified capture group, as documented:
1円
\g1
\g{1}
\g-1
\g{-1}
However, the following variants refer to the subpattern code instead of the matched text:
\g<1>
\g'1'
\g<-1>
\g'-1'
With named backreferences, we may also use the \k escape sequence as well as the (?P=...) construct. The following combinations also refer to the text matching the named capture group, as documented:
\g{name}
\k{name}
\k<name>
\k'name'
(?P=name)
However, these refer to the subpattern code instead of the matched text:
g<name>
\g'name'
In the following example, the capture group searches for a single letter 'a' or 'b', and then the backreference looks for the same letter. Thus, the patterns are expected to match 'aa' and 'bb', but not 'ab' nor 'ba'.
<?php
/* Matches to the following patterns are replaced by 'xx' in the subject string 'aa ab ba bb'. */
$patterns = [
# numbered backreferences (absolute)
'/([ab])1円/', // 'xx ab ba xx'
'/([ab])\g1/', // 'xx ab ba xx'
'/([ab])\g{1}/', // 'xx ab ba xx'
'/([ab])\g<1>/', // 'xx xx xx xx' # unexpected behavior, backreference matches both 'a' and 'b'.
"/([ab])\g'1'/", // 'xx xx xx xx' # unexpected behavior, backreference matches both 'a' and 'b'.
'/([ab])\k{1}/', // 'aa ab ba bb' # No group with name "1", backreference to unset group always fails.
'/([ab])\k<1>/', // 'aa ab ba bb' # No group with name "1", backreference to unset group always fails.
"/([ab])\k'1'/", // 'aa ab ba bb' # No group with name "1", backreference to unset group always fails.
'/([ab])(?P=1)/', // NULL # Regex error: "subpattern name must start with a non-digit", (?P=) expects name not number.
# numbered backreferences (relative)
'/([ab])\-1/', // 'aa ab ba bb'
'/([ab])\g-1/', // 'xx ab ba xx'
'/([ab])\g{-1}/', // 'xx ab ba xx'
'/([ab])\g<-1>/', // 'xx xx xx xx' # unexpected behavior, backreference matches both 'a' and 'b'.
"/([ab])\g'-1'/", // 'xx xx xx xx' # unexpected behavior, backreference matches both 'a' and 'b'.
'/([ab])\k{-1}/', // 'aa ab ba bb' # No group with name "-1", backreference to unset group always fails.
'/([ab])\k<-1>/', // 'aa ab ba bb' # No group with name "-1", backreference to unset group always fails.
"/([ab])\k'-1'/", // 'aa ab ba bb' # No group with name "-1", backreference to unset group always fails.
'/([ab])(?P=-1)/', // NULL # Regex error: "subpattern name expected", (?P=) expects name not number.
# named backreferences
'/(?<name>[ab])\g{name}/', // 'xx ab ba xx'
'/(?<name>[ab])\g<name>/', // 'xx xx xx xx' # unexpected behavior, backreference matches both 'a' and 'b'.
"/(?<name>[ab])\g'name'/", // 'xx xx xx xx' # unexpected behavior, backreference matches both 'a' and 'b'.
'/(?<name>[ab])\k{name}/', // 'xx ab ba xx'
'/(?<name>[ab])\k<name>/', // 'xx ab ba xx'
"/(?<name>[ab])\k'name'/", // 'xx ab ba xx'
'/(?<name>[ab])(?P=name)/', // 'xx ab ba xx'
];
foreach ($patterns as $pat)
echo " '$pat',\t// " . var_export(@preg_replace($pat, 'xx', 'aa ab ba bb'), 1) . PHP_EOL;
?>