I'm starting to play with F# to do some data parsing and I was able to create a function to group an array of string into sub groups. The code looks like this:
let breakBy lines pattern =
let rec breakByRec lines pattern acc =
match lines with
| [] -> acc
| head::tail ->
match pattern head with
| true ->
let newGroup = [head]
let newList = newGroup :: acc
breakByRec tail pattern newList
| false ->
let lastGroup = List.head acc
let newGroup = head :: lastGroup;
let newList = newGroup :: List.tail acc
breakByRec tail pattern newList
breakByRec lines pattern [[]]
However, I feel that it could be improved specially regarding the way I handle intermediate values when concatenating the lists and the data structure used to return the grouped data.
The goal, at the end, is to insert each of these "groups" as single entities in the database. Is there a more suitable data structure that I should take a look?
Below is a sample ready to be used and an example output.
let data = [ "Name=John"; "Age=29"; "City=San Francisco"; "Name=Jane"; "Age=28"; "City=New York"; "Name=Mike"; "Age=35"; "City=Miami" ] let matchName line = Regex.IsMatch(line, "^Name=") let people = breakBy data matchName printfn "%A" people /* The output looks like this: val people : string list list = [ ["City=Miami"; "Age=35"; "Name=Mike"]; ["City=New York"; "Age=28"; "Name=Jane"]; ["City=San Francisco"; "Age=29"; "Name=John"]; [] ] */
Any suggestions are appreciated.
-
\$\begingroup\$ Maybe this would be better with one of the type providers - the CSV type provider is probably a close match here. \$\endgroup\$John Palmer– John Palmer2016年03月28日 03:56:10 +00:00Commented Mar 28, 2016 at 3:56
2 Answers 2
The fact that a function named breakBy
also reverses the order of the data violates the Principle of Least Surprise. In my opinion, it's a bug.
breakBy (Regex "^Name=").IsMatch
can be thought of as an transformation to be applied to a list. Therefore, to facilitate currying, the order of the parameters to breakBy
should be reversed, so that the predicate comes before the data.
match pattern head with
| true -> ...
| false -> ...
I don't see any reason to use pattern matching here, if
would work the same and is simpler:
if pattern head then ...
else ...
I think the empty group at the end of output shouldn't be there, get rid of it.
Since lines
is a list
, you can use List.foldBack
to simplify the code:
let breakBy lines pattern =
let processLine line (head, tail) =
let head' = line::head
if pattern line then
([], head'::tail)
else
(head', tail)
snd <| List.foldBack processLine lines ([], [])
I'm assuming that the input always has to start with a matching line. If that's not necessarily true, the last line can't be just snd
.
This also does not reverse the inputs, as 200_success suggested.