My goal
I am parsing from a string which contains token:value
pairs into a type.
Example:
mutable struct Foo
bar
baz
qux
end
function parse(str::AbstractString)::Foo
f = Foo()
bar_pattern = r"bar:(\w*)"
baz_pattern = r"baz:(\d*)"
qux_pattern = r"qux:(\w*)"
f.bar = match(bar_pattern, s)[1]
f.baz = match(baz_pattern, s)[1]
f.qux = match(qux_pattern, s)[1]
return d
end
Problem
This works but only if the patterns are actually present. When match
can't find the pattern, it returns nothing
, which of course can't be indexed [1]
or accessed with captures
. The result is an error.
I want the fields of the returned struct to either get the matched result (the "capture") directly, or remain empty or set to nothing
, should the match be unable to find the pattern.
I could do something like this:
function safeparse(str::AbstractString)::Foo
f = Foo()
bar_pattern = r"bar:(\w*)"
baz_pattern = r"baz:(\d*)"
qux_pattern = r"qux:(\w*)"
if !isnothing(match(bar_pattern, s))
f.bar = match(bar_pattern, s)[1]
end
if !isnothing(match(baz_pattern, s))
f.baz = match(baz_pattern, s)[1]
end
if !isnothing(match(qux_pattern, s))
f.qux = match(qux_pattern, s)[1]
end
return f
end
But that approach seems ugly and becomes verbose very quick if more/new patterns are introduced.
Question
Is there a nicer but readable way to achieve this?
Preferably without combining/changing the regex patterns or too much regex magic, however I am open to that route too if it is the only nice (less verbose) way. I am of course also open to general tips.
To keep things simple, just assume that the patterns my example is looking for only appear 0 or 1 times. However if the only way to make this nicer involves writing another function like safematch
which does the check for nothing and returns the captured value or nothing, I would want that to also work with multiple matches somehow and stay a bit more general.
1 Answer 1
The easiest thing I can think of is to use eachmatch
-- when there is no match at all, it returns an empty iterator:
julia> collect(eachmatch(r"(bar:(\w*))", "bla bar:werfsd 1223 bar:skdf"))
2-element Array{RegexMatch,1}:
RegexMatch("bar:werfsd", 1="bar:werfsd", 2="werfsd")
RegexMatch("bar:skdf", 1="bar:skdf", 2="skdf")
julia> collect(eachmatch(r"(bar:(\w*))", "bla "))
RegexMatch[]
Then you can simply combine those into a dictionary, for example:
julia> Dict(re => [m[1] for m in eachmatch(re, s)] for re in patterns)
Dict{Regex,Array{T,1} where T} with 3 entries:
r"qux:(\w*)" => Union{Nothing, SubString{String}}[]
r"baz:(\d*)" => SubString{String}["33"]
r"bar:(\w*)" => SubString{String}["werfsd", "skdf"]
Without further information about how you want to organize your data structure when more then one value occurs, I can't really say more. Perhaps you can make use of merge
.
Give types to struct fields, and don't use mutable structs unless necessary. Without knowing more, I suggest
struct Foo
bar::Union{String, Nothing}
baz::Union{String, Nothing}
qux::Union{String, Nothing}
end
function Foo(;bar=nothing, baz=nothing, qux=nothing)
return Foo(convert(Union{String, Nothing}, bar),
convert(Union{String, Nothing}, baz),
convert(Union{String, Nothing}, qux))
end
which also takes care of converting the SubString
from the regex match, in case this is relevant.