2
\$\begingroup\$

I have been struggling with a regular expression involving path names. Immediately, this is a bit troublesome, owing to the embedded / in the pattern, but braces to the rescue.

First the convention I have imposed:

Every path in the set looks like:

/ifmxdev/files/file.0123

but that top directory can take a suffix, for example:

/ifmxdev_test/files/file.8765

The final suffix is exactly 4 digits

After much struggle and an hour of composing this plea, as well as my own thoughts, I came up with a truly ugly but working pattern:

$rawfile_pattern = qr{/ifmxdev[_0-9A-Za-z]*/files/file.\d{4}};
if (! $fname =~ $rawfile_pattern) {....

Now I'd just like some help in making that pattern more elegant, mainly compacting that bracketed section of the regex.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Jun 12, 2014 at 22:39
\$\endgroup\$
2
  • 4
    \$\begingroup\$ Just use \w for that character class: qr{/ifmxdev\w*/files/file\.\d{4}};. And don't forget to escape the period in the suffix. Finally, might want to add some anchors ^ and $, but that's up to you. \$\endgroup\$ Commented Jun 12, 2014 at 22:41
  • \$\begingroup\$ Thanks, Miller - this did it. And extra thanks for reminding me to escape that period. In the jumble of trying everything that got lost by the wayside. \$\endgroup\$ Commented Jun 16, 2014 at 13:53

2 Answers 2

2
\$\begingroup\$

It probably doesn't matter, but the first element of that path can be ifmxdev______ or ifmxdev_x_x_x with that regex.

I suggest you use use the Unicode property alnum, which is [A-Za-z0-9] - \w without the underscore

qr{ \A /ifmxdev (?:_\p{alnum}+)? /files /file\.\d{4} \z }x
answered Jun 13, 2014 at 8:57
\$\endgroup\$
1
\$\begingroup\$

Here are some options:

As Miller mentioned in the comments, instead of [_0-9A-Za-z]* you could use

\w*

Or you could use a negated character class

[^/]*

which would be more efficient if you are doing this operation a lot of times. This is because * is greedy and [^/]* will stop before \w*.

You could also write /files/file as

(/files?){2}

I think that is less readable though. And if you did use it, you might want to use a non-capturing group, which also might make it harder to read:

(?:files?){2}.

Borodin also shows the free-format modifier /x. That causes whitespace to be ignored in your regular expression. This might help with readability.

Last, as Miller also mentions, you might want to add anchors: ^ and $. This would prevent something like "/ifmxdev/files/file.01231234" from matching.

qr{ ^ /ifmxdev[^/]* (?:/files?){2} \.\d{4} $ }x

Here is a nice FAQ from stack overflow.

And if you are interested in more information, I highly recommend Mastering Regular Expressions. I'm still working my way through it but it has taught me a lot.

PS. I would have posted more links, but I don't have enough rep points yet ;-)

answered Jul 8, 2014 at 16:05
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.