I have a list of path I want to filter out. I want to keep only paths matching a specific pattern and remove all paths matching another specific pattern.
To achieve this I pattern matched on each pattern. Here is the code:
let stringToSpan (s: string) =
System.ReadOnlySpan<char>(s.ToCharArray())
let matchWildcard pattern text =
System.IO.Enumeration.FileSystemName.MatchesSimpleExpression(stringToSpan pattern, stringToSpan text)
let filterByPattern (includePattern: string Option) (excludePattern: string Option) (paths: string seq) :string seq =
paths
|> Seq.filter
(fun path ->
let shouldKeep =
match includePattern with
| Some pattern -> matchWildcard pattern path
| None -> true
if not shouldKeep then
false
else
match excludePattern with
| Some pattern -> matchWildcard pattern path |> not
| None -> true)
You can try this snippet with:
seq {"test"; "foo/bar"; "foo/toto"}
|> filterByPattern (Some "foo/*") (Some "*toto*")
|> printfn "%A"
// output: foo/bar
If it matters, this snippet is used inside a dotnet tool to clone multiple git repositories. You can find this tool on GitHub.
I think that we could improve this snippet a bit. Can we?
3 Answers 3
This could be simplified by using straightforward composition:
let matches (x: string) (y: string) =
System.IO.Enumeration.FileSystemName.MatchesSimpleExpression
( System.MemoryExtensions.AsSpan x
, System.MemoryExtensions.AsSpan y
)
seq {"test"; "foo/bar"; "foo/toto"}
|> Seq.filter (matches "foo/*")
|> Seq.filter (matches "*toto*" >> not)
|> printfn "%A"
And that's it! Just filter it twice.
This way you don't need to figure out what each of the arguments to filterByPattern
mean. If you really want to make a function out of it:
let includeThenExclude include' exclude =
Seq.filter (matches include') >> Seq.filter (matches exclude >> not)
If you only sometimes need to include or exclude, instead of Option
s:
let including x = matches x |> Seq.filter
let excluding x = matches x >> not |> Seq.filter
let includingFoo =
seq {"test"; "foo/bar"; "foo/toto"}
|> including "foo/*"
let includingFooExcludingToto =
seq {"test"; "foo/bar"; "foo/toto"}
|> including "foo/*"
|> excluding "*toto*"
Lets focus on filterByPattern
.
A Predicate is a function returning a boolean ('T -> bool
). Since the predicate host most of the logic, lets start by spliting the Predicate from the filter to work on it.
let patternPredicate (includePattern: string Option) (excludePattern: string Option) (path: string) :bool =
let shouldKeep =
match includePattern with
| Some pattern -> matchWildcard pattern path
| None -> true
if not shouldKeep then
false
else
match excludePattern with
| Some pattern -> matchWildcard pattern path |> not
| None -> true
let filterByPattern (includePattern: string Option) (excludePattern: string Option) (paths: string seq) :string seq =
Seq.filter (patternPredicate includePattern excludePattern) paths
Alright now we will stop modifying filterByPattern and focus on patternPredicate. Since we are only dealing with boolean maybe there is a refactoring to do here. Lets switch from pattern matching to basic boolean logic.
let patternPredicate (includePattern: string Option) (excludePattern: string Option) (path: string) :bool =
let shouldKeep = includePattern.IsNone || (includePattern.IsSome && matchWildcard includePattern.Value path)
if not shouldKeep then
false
else
excludePattern.IsNone || (excludePattern.IsSome && (not (matchWildcard excludePattern.Value path))))
shouldKeep
can be seen as (not a) or (a and b)
where a is the option state and b the result of the matching.
Since we don't have to check twice the value of a, (not a) or (a and b)
can be simplify to (not a) or b
. If you want to verify this logic, you can use a Truth Table or rely on a tool like decode.fr's Boolean Expressions Calculator
Lets apply this to our function!
let patternPredicate (includePattern: string Option) (excludePattern: string Option) (path: string) :bool =
let shouldKeep = includePattern.IsNone || (matchWildcard includePattern.Value path)
if not shouldKeep then
false
else
excludePattern.IsNone || (not (matchWildcard excludePattern.Value path))
Thats better! Now lets get rid of this if/else
clause and some parenthesis also.
let patternPredicate (includePattern: string Option) (excludePattern: string Option) (path: string) :bool =
(includePattern.IsNone || matchWildcard includePattern.Value path) &&
(excludePattern.IsNone || not (matchWildcard excludePattern.Value path))
So the complete snippet is now:
let stringToSpan (s: string) =
System.ReadOnlySpan<char>(s.ToCharArray())
let matchWildcard pattern text =
System.IO.Enumeration.FileSystemName.MatchesSimpleExpression(stringToSpan pattern, stringToSpan text)
let patternPredicate (includePattern: string Option) (excludePattern: string Option) path =
(includePattern.IsNone || matchWildcard includePattern.Value path) &&
(excludePattern.IsNone || not (matchWildcard excludePattern.Value path))
let filterByPattern (includePattern: string Option) (excludePattern: string Option) (paths: string seq) :string seq =
Seq.filter (patternPredicate includePattern excludePattern) paths
seq {"test"; "foo/bar"; "foo/toto"}
|> filterByPattern (Some "foo/*") (Some "*toto*")
|> printfn "%A"
// output: foo/bar
I would avoid using Option .IsNone
and .Value
, mainly because .Value
will throw an exception if you made a mistake and called it at the wrong time. Almost all F# code will avoid that property and use pattern matching instead or some other function.
Here's another way to write the patternPredicate
function, which I renamed to includePath
do describe what it actually does:
let includePath (includePattern: string option) (excludePattern: string option) path =
let isIncluded =
match includePattern with
| Some includePattern -> matchWildcard includePattern path
| None -> true
let isExcluded =
lazy
match excludePattern with
| Some excludePattern -> matchWildcard excludePattern path
| None -> false
isIncluded && not isExcluded.Value
My approach here was to first build up the values isIncluded
and isExcluded
so that we can write isIncluded && not isExcluded.Value
. This breaks the problem down into understandable and explicit parts.
Note the use of a lazy value for isExcluded
. This means that it won't be evaluated unless isIncluded
is true, so we don't need to call matchWildcard
unnecessarily.
-
\$\begingroup\$ I really like the use of
lazy
here. \$\endgroup\$aloisdg– aloisdg2021年05月22日 14:02:52 +00:00Commented May 22, 2021 at 14:02
.ToCharArray()
in yourstringToSpan
function. You're forcing it to allocate a character array on the heap when it's not needed. If there's nothing built into F# to support this, you should callSystem.MemoryExtensions.AsSpan(s)
in this function instead. \$\endgroup\$