3
\$\begingroup\$

I need to check the end of a URL for the possible existence of /news_archive or /news_archive/5 in PHP. The below snippet does exactly what I want, but I know that I could achieve this with one preg_match rather than two. How can I improve this code to treat the /5 as an optional segment and capture it if it exists?

if (preg_match('~/[0-9A-Za-z_-]+_archive/[0-9]+$~', $_SERVER['HTTP_REFERER'], $matches) || preg_match('~/[0-9A-Za-z_-]+_archive$~', $_SERVER['HTTP_REFERER'], $matches)) {
 $page_info['parent_page']['page_label'] = ltrim($matches[0], '/');
}
asked Oct 5, 2012 at 16:48
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

Consider your first pattern:

~/[0-9A-Za-z_-]+_archive/[0-9]+$~

Let's break it down:

  1. / a literal string /
  2. [0-9A-Za-z_-]+ one or more of 0-9, A-Z, a-z, _ or -
  3. _archive a literal string _archive
  4. / literal slash again
  5. [0-9]+ one or more digits
  6. $ the end of the string must follow the one or more digits

So basically you want to make #4 and #5 optional. To be more specific, you want either both 4 and 5, or neither 4 nor 5.

Consider this:

(a[b]+)?

This means that you have one a followed by one or more b, and that this grouped a/b entity is optional.

Letting a be #4 and b be digits like in #5, we're left with:

(/[0-9]+)?

Or:

~/[0-9A-Za-z_-]+_archive(/[0-9]+)?$~

This will capture the entire group though, like /5:

php -r "preg_match('~/[0-9A-Za-z_-]+_archive(/([0-9]+))?$~', '/news_archive/5', $m); var_dump($m);"
array(2) {
 [0] =>
 string(15) "/news_archive/5"
 [1] =>
 string(2) "/5"
}

You can just add another group to remedy that though:

~/[0-9A-Za-z_-]+_archive(/([0-9]+))?$~

Example:

php -r "preg_match('~/[0-9A-Za-z_-]+_archive(/([0-9]+))?$~', '/news_archive/44', $m); var_dump($m);"
array(3) {
 [0] =>
 string(16) "/news_archive/44"
 [1] =>
 string(3) "/44"
 [2] =>
 string(2) "44"
}

You could technically make the outside group a non-capturing group (like (?:/([0-9]+))?), but I don't think the added complication is worth not grabbing the / part too.

(By the way, sorry if you're familiar with regex and you found this excessive. I tend to take a very verbose approach to any regex related question :).)

answered Oct 5, 2012 at 19:16
\$\endgroup\$
2
  • \$\begingroup\$ This is a fantastic response, and certainly more than I expected. In a good way. I am fairly unfamiliar with regex itself, so the thorough analysis was a pleasant and refreshing lesson! Thank you. \$\endgroup\$ Commented Oct 5, 2012 at 22:43
  • \$\begingroup\$ @davo0105 Glad I could help! :) \$\endgroup\$ Commented Oct 6, 2012 at 3:58

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.