Capturing optional regex segment with PHP

Question 1

I need to check the end of a URL for the possible existence of /news_archive or /news_archive/5 in PHP. The below snippet does exactly what I want, but I know that I could achieve this with one preg_match rather than two. How can I improve this code to treat the /5 as an optional segment and capture it if it exists?

if (preg_match('~/[0-9A-Za-z_-]+_archive/[0-9]+$~', $_SERVER['HTTP_REFERER'], $matches) || preg_match('~/[0-9A-Za-z_-]+_archive$~', $_SERVER['HTTP_REFERER'], $matches)) {
 $page_info['parent_page']['page_label'] = ltrim($matches[0], '/');
}

Question 2

Consider your first pattern:

~/[0-9A-Za-z_-]+_archive/[0-9]+$~

Let's break it down:

/ a literal string /
[0-9A-Za-z_-]+ one or more of 0-9, A-Z, a-z, _ or -
_archive a literal string _archive
/ literal slash again
[0-9]+ one or more digits
$ the end of the string must follow the one or more digits

So basically you want to make #4 and #5 optional. To be more specific, you want either both 4 and 5, or neither 4 nor 5.

Consider this:

(a[b]+)?

This means that you have one a followed by one or more b, and that this grouped a/b entity is optional.

Letting a be #4 and b be digits like in #5, we're left with:

(/[0-9]+)?

Or:

~/[0-9A-Za-z_-]+_archive(/[0-9]+)?$~

This will capture the entire group though, like /5:

php -r "preg_match('~/[0-9A-Za-z_-]+_archive(/([0-9]+))?$~', '/news_archive/5', $m); var_dump($m);"
array(2) {
 [0] =>
 string(15) "/news_archive/5"
 [1] =>
 string(2) "/5"
}

You can just add another group to remedy that though:

~/[0-9A-Za-z_-]+_archive(/([0-9]+))?$~

Example:

php -r "preg_match('~/[0-9A-Za-z_-]+_archive(/([0-9]+))?$~', '/news_archive/44', $m); var_dump($m);"
array(3) {
 [0] =>
 string(16) "/news_archive/44"
 [1] =>
 string(3) "/44"
 [2] =>
 string(2) "44"
}

You could technically make the outside group a non-capturing group (like (?:/([0-9]+))?), but I don't think the added complication is worth not grabbing the / part too.

(By the way, sorry if you're familiar with regex and you found this excessive. I tend to take a very verbose approach to any regex related question :).)

Question 3

This is a fantastic response, and certainly more than I expected. In a good way. I am fairly unfamiliar with regex itself, so the thorough analysis was a pleasant and refreshing lesson! Thank you.

Question 4

@davo0105 Glad I could help! :)

Corbin Corbin 10.6k2 gold badges31 silver badges51 bronze badges · Accepted Answer · 2012-10-05 19:16:28Z

Consider your first pattern:

~/[0-9A-Za-z_-]+_archive/[0-9]+$~

Let's break it down:

/ a literal string /
[0-9A-Za-z_-]+ one or more of 0-9, A-Z, a-z, _ or -
_archive a literal string _archive
/ literal slash again
[0-9]+ one or more digits
$ the end of the string must follow the one or more digits

So basically you want to make #4 and #5 optional. To be more specific, you want either both 4 and 5, or neither 4 nor 5.

Consider this:

(a[b]+)?

This means that you have one a followed by one or more b, and that this grouped a/b entity is optional.

Letting a be #4 and b be digits like in #5, we're left with:

(/[0-9]+)?

Or:

~/[0-9A-Za-z_-]+_archive(/[0-9]+)?$~

This will capture the entire group though, like /5:

php -r "preg_match('~/[0-9A-Za-z_-]+_archive(/([0-9]+))?$~', '/news_archive/5', $m); var_dump($m);"
array(2) {
 [0] =>
 string(15) "/news_archive/5"
 [1] =>
 string(2) "/5"
}

You can just add another group to remedy that though:

~/[0-9A-Za-z_-]+_archive(/([0-9]+))?$~

Example:

php -r "preg_match('~/[0-9A-Za-z_-]+_archive(/([0-9]+))?$~', '/news_archive/44', $m); var_dump($m);"
array(3) {
 [0] =>
 string(16) "/news_archive/44"
 [1] =>
 string(3) "/44"
 [2] =>
 string(2) "44"
}

You could technically make the outside group a non-capturing group (like (?:/([0-9]+))?), but I don't think the added complication is worth not grabbing the / part too.

(By the way, sorry if you're familiar with regex and you found this excessive. I tend to take a very verbose approach to any regex related question :).)

This is a fantastic response, and certainly more than I expected. In a good way. I am fairly unfamiliar with regex itself, so the thorough analysis was a pleasant and refreshing lesson! Thank you.

Stack Exchange Network

Capturing optional regex segment with PHP

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Capturing optional regex segment with PHP

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions