3
\$\begingroup\$

My goal

I am parsing from a string which contains token:value pairs into a type.

Example:

mutable struct Foo
 bar
 baz
 qux
end
function parse(str::AbstractString)::Foo
 f = Foo()
 bar_pattern = r"bar:(\w*)"
 baz_pattern = r"baz:(\d*)"
 qux_pattern = r"qux:(\w*)"
 f.bar = match(bar_pattern, s)[1]
 f.baz = match(baz_pattern, s)[1]
 f.qux = match(qux_pattern, s)[1]
 return d
end

Problem

This works but only if the patterns are actually present. When match can't find the pattern, it returns nothing, which of course can't be indexed [1] or accessed with captures. The result is an error.

I want the fields of the returned struct to either get the matched result (the "capture") directly, or remain empty or set to nothing, should the match be unable to find the pattern.

I could do something like this:

function safeparse(str::AbstractString)::Foo
 f = Foo()
 bar_pattern = r"bar:(\w*)"
 baz_pattern = r"baz:(\d*)"
 qux_pattern = r"qux:(\w*)"
 if !isnothing(match(bar_pattern, s))
 f.bar = match(bar_pattern, s)[1]
 end
 if !isnothing(match(baz_pattern, s))
 f.baz = match(baz_pattern, s)[1]
 end
 if !isnothing(match(qux_pattern, s))
 f.qux = match(qux_pattern, s)[1]
 end
 return f
end

But that approach seems ugly and becomes verbose very quick if more/new patterns are introduced.

Question

Is there a nicer but readable way to achieve this?

Preferably without combining/changing the regex patterns or too much regex magic, however I am open to that route too if it is the only nice (less verbose) way. I am of course also open to general tips.

To keep things simple, just assume that the patterns my example is looking for only appear 0 or 1 times. However if the only way to make this nicer involves writing another function like safematch which does the check for nothing and returns the captured value or nothing, I would want that to also work with multiple matches somehow and stay a bit more general.

asked Dec 7, 2020 at 2:57
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

The easiest thing I can think of is to use eachmatch -- when there is no match at all, it returns an empty iterator:

julia> collect(eachmatch(r"(bar:(\w*))", "bla bar:werfsd 1223 bar:skdf"))
2-element Array{RegexMatch,1}:
 RegexMatch("bar:werfsd", 1="bar:werfsd", 2="werfsd")
 RegexMatch("bar:skdf", 1="bar:skdf", 2="skdf")
julia> collect(eachmatch(r"(bar:(\w*))", "bla "))
RegexMatch[]

Then you can simply combine those into a dictionary, for example:

julia> Dict(re => [m[1] for m in eachmatch(re, s)] for re in patterns)
Dict{Regex,Array{T,1} where T} with 3 entries:
 r"qux:(\w*)" => Union{Nothing, SubString{String}}[]
 r"baz:(\d*)" => SubString{String}["33"]
 r"bar:(\w*)" => SubString{String}["werfsd", "skdf"]

Without further information about how you want to organize your data structure when more then one value occurs, I can't really say more. Perhaps you can make use of merge.

Give types to struct fields, and don't use mutable structs unless necessary. Without knowing more, I suggest

struct Foo
 bar::Union{String, Nothing}
 baz::Union{String, Nothing}
 qux::Union{String, Nothing}
end
function Foo(;bar=nothing, baz=nothing, qux=nothing) 
 return Foo(convert(Union{String, Nothing}, bar),
 convert(Union{String, Nothing}, baz),
 convert(Union{String, Nothing}, qux))
end

which also takes care of converting the SubString from the regex match, in case this is relevant.

answered Dec 19, 2020 at 15:00
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.