I have this regular expression that extracts guid and couple of other attributes like name, type and version. Please review the regex for any optimizations and improvements.
Strings are always in the pattern of
/publication/guid/type/name;version=1234
regex
(([a-f0-9]+\-)+[a-f0-9]+)\/(.*?)\/(.*?);version=(\d*)
Test records
Extract bold pieces from string.
/publication/d40a4e4c-d6a3-45ae-98b3-924b31d8712a/collection/content1;version=1520623346833
Expected output:
- d40a4e4c-d6a3-45ae-98b3-924b31d8712a
- collection
- content1
- 1520623346833
/publication/d40a4e4c-d6a3-45ae-98b3-924b31d8712a/article/testContent;version=1520623346891
Expected output
- d40a4e4c-d6a3-45ae-98b3-924b31d8712a
- article
- testContent
- 1520623346891
Code
Language is F#, but the regex works in C# too. Furthermore, I would like to use the same regex in Node.js, so I would like the regex to be language agnostic.
let matchEntity (m: Match) =
{ id= m.Groups.[1].Value; eType = m.Groups.[3].Value; name= m.Groups.[4].Value; version = m.Groups.[5].Value}
let regex = new Regex("(([a-f0-9]+\-)+[a-f0-9]+)\/(.*?)\/(.*?);version=(\d*)")
matchEntity regex.Match "/publication/d40a4e4c-d6a3-45ae-98b3-924b31d8712a/collection/content1;version=1520623346833"
-
\$\begingroup\$ code added, language is c#, f# but it should be language agnostics, I use in nodejs too, the regex has to be generic, no specific language implementation required. \$\endgroup\$App2015– App20152018年03月18日 05:31:06 +00:00Commented Mar 18, 2018 at 5:31
-
\$\begingroup\$ It doesn't work for me on regexr \$\endgroup\$Raystafarian– Raystafarian2018年03月18日 05:55:53 +00:00Commented Mar 18, 2018 at 5:55
-
\$\begingroup\$ it works. i.sstatic.net/gyZnT.png \$\endgroup\$App2015– App20152018年03月18日 06:09:18 +00:00Commented Mar 18, 2018 at 6:09
1 Answer 1
TL;DR; list of adjustments:
- .NET RegEx support named capture groups. make use of them.
- Make use of the GUID format specification.
- Simplify retrieval with non-capturing groups.
- Make assumptions explicit in character classes. Prefer negated chargroups to non-greedy matching
I propose the following regular expression instead:
(?<guid>[a-f0-9]{8}(?:\-[a-f0-9]{4}){3}\-[a-f0-9]{12})\/(?<type>[^\/]+)\/(?<name>[^;]+);version=(?<version>\d*)
While this regex is somewhat longer it matches both examples in 62 steps (as opposed to 117). This may only seem like a minor improvement, but it's not all that happens:
This regex uses named capturing groups that allow a much more natural and clear pattern extraction. Instead of accessing groups by magic indices, you can access them by name and the construction of matchEntity
is accomplished as follows:
let matchEntity (m: Match) =
{ id= m.Groups.["guid"].Value;
eType = m.Groups.["type"].Value;
name= m.Groups.["name"].Value;
version = m.Groups.["version"].Value }
Last but not least this regex also does not match incorrect GUID specifications
-
\$\begingroup\$ +1, recently, there is also support for named RegExp capture groups in JS. \$\endgroup\$ComFreek– ComFreek2018年03月20日 07:35:04 +00:00Commented Mar 20, 2018 at 7:35