I have been struggling with a regular expression involving path names. Immediately, this is a bit troublesome, owing to the embedded / in the pattern, but braces to the rescue.
First the convention I have imposed:
Every path in the set looks like:
/ifmxdev/files/file.0123
but that top directory can take a suffix, for example:
/ifmxdev_test/files/file.8765
The final suffix is exactly 4 digits
After much struggle and an hour of composing this plea, as well as my own thoughts, I came up with a truly ugly but working pattern:
$rawfile_pattern = qr{/ifmxdev[_0-9A-Za-z]*/files/file.\d{4}};
if (! $fname =~ $rawfile_pattern) {....
Now I'd just like some help in making that pattern more elegant, mainly compacting that bracketed section of the regex.
2 Answers 2
It probably doesn't matter, but the first element of that path can be ifmxdev______
or ifmxdev_x_x_x
with that regex.
I suggest you use use the Unicode property alnum
, which is [A-Za-z0-9]
- \w
without the underscore
qr{ \A /ifmxdev (?:_\p{alnum}+)? /files /file\.\d{4} \z }x
Here are some options:
As Miller mentioned in the comments, instead of [_0-9A-Za-z]* you could use
\w*
Or you could use a negated character class
[^/]*
which would be more efficient if you are doing this operation a lot of times. This is because * is greedy and [^/]* will stop before \w*.
You could also write /files/file as
(/files?){2}
I think that is less readable though. And if you did use it, you might want to use a non-capturing group, which also might make it harder to read:
(?:files?){2}.
Borodin also shows the free-format modifier /x. That causes whitespace to be ignored in your regular expression. This might help with readability.
Last, as Miller also mentions, you might want to add anchors: ^ and $. This would prevent something like "/ifmxdev/files/file.01231234" from matching.
qr{ ^ /ifmxdev[^/]* (?:/files?){2} \.\d{4} $ }x
Here is a nice FAQ from stack overflow.
And if you are interested in more information, I highly recommend Mastering Regular Expressions. I'm still working my way through it but it has taught me a lot.
PS. I would have posted more links, but I don't have enough rep points yet ;-)
\w
for that character class:qr{/ifmxdev\w*/files/file\.\d{4}};
. And don't forget to escape the period in the suffix. Finally, might want to add some anchors^
and$
, but that's up to you. \$\endgroup\$