3

I want to read a file into a collection of objects. The data (it's a Leica PTX file if you're curious) is formatted as follows:

640 [begin item #1: number of columns]
480 [number of rows]
0 0 0 [position information: 3x3 matrix]
0 0 0 [position ctd]
0 0 0 [position ctd]
1 0 0 0 [calibration information: 3x4 matrix]
0 1 0 0 [calibration ctd]
0 0 1 0 [calibration ctd]
1.1 2.2 3.3 0.5 [point 1: X Y Z I]
-0.2 2.3 1.4 0.2 [point 2: X Y Z I]
...
3.9 1.2 -7.7 0.8 [point n: X Y Z I]
640 [begin item #2: number of columns]
480 [number of rows]
0 0 0 [position information]
0 0 0
0 0 0
1 0 0 0 [calibration information]
0 1 0 0 
0 0 1 0
0.2 1.3 -2.1 0.4 [point 1 of item #2]
... [etc]

I.e. Two lines each containing a single float/double (as text) signal the beginning of a new item, and no other lines are exactly one number.

It's easy to do if there's only one item per file, it's a simple fold operation. It would also be relatively straightforward to do with a while type loop that keeps state, but I'm new to functional programming and wondering if there isn't a more concise and elegant way to do this functionally with standard tools (like reduce, fold, split/partition, etc). The best solutions I've come up with so far involve "peeking" ahead using lists or arrays, but I'd like something that allows for the possibility of a sequence that is consumed as it iterates for maximum generality. (I'm using F# and .NET but I don't see this as a language-specific problem.)

asked Oct 26, 2015 at 9:33
4
  • 1
    What have you tried so far? Where are you stuck? Can you share some example data? Commented Oct 26, 2015 at 10:57
  • Thanks for your comment. I edited the question. Does it make it more clear what I'm struggled with? Commented Oct 26, 2015 at 17:05
  • 2
    Looks like a job for a parser combinator library. I'm not an F# guru but Googling "F# parser combinators" turns up this Commented Oct 26, 2015 at 22:54
  • @BenjaminHodgson I think learning about parser combinators is an overkill for parsing something that's this simple. Commented Oct 27, 2015 at 14:02

3 Answers 3

2

If the whole file is small enough to fit into memory, then I think one way would be to parse the file into a list and then process the list piece by piece, something like:

let parseItem (columnsString::rowsString::rest) =
 let columns = int columnsString
 let rows = int rowsString
 let position, rest = parsePosition rest
 let calibration, rest = parseCalibration rest
 let points, rest = parsePoints columns rows rest
 (columns, rows, position, calibration, points), rest
let rec parseItems = function
 | [] -> []
 | lines ->
 let item, restLines = parseItem lines
 let restItems = parseItems restLines
 item :: restItems

As you surely noticed, this code is pretty repetitive, but you can simplify it using the state workflow (known as state monad in Haskell) from ExtCore:

let parseItem =
 state {
 let! columns = parseInt
 let! rows = parseInt
 let! position = parsePosition
 let! calibration = parseCalibration
 let! points = parsePoints columns rows
 return columns, rows, position, calibration, points
 }
let rec parseItems = function
 | [] -> [], []
 | lines ->
 state {
 let! item = parseItem
 let! restItems = parseItems
 return item::restItems
 } <| lines
answered Oct 26, 2015 at 18:26
1
  • Thanks - passing rest back from the inner function calls is exactly the sort of thing I was trying to think of, and couldn't. Commented Oct 29, 2015 at 17:13
3

I'd wager that the most popular tool in the Haskell world for this sort of problem is parser combinators. The classic library for this is parsec, but for a machine generated file format the attoparsec library is generally a better choice (faster parsing at the cost of less friendly error reporting—a good tradeoff when you're reading large files that were not written by humans).

There is an excellent tutorial demonstrating attoparsec's use by parsing log files.

Attoparsec is popular enough that somebody tried to port it to F#, but I've never even written an F# program, so I definitely don't have any way of judging the quality of that library. There's also a port of the regular (non-"atto") Haskell parsec library to F#, that might be worth looking at—it also comes with a tutorial.

If you wish to understand a little bit of how parser combinators work under the hood, Graham Hutton and Erik Meijer wrote two simple tutorials showing simplified internals. The first, I believe, is an abridged version of the second:

These are in Haskell but if you know F# you will likely be able to follow a lot of it.

answered Nov 7, 2015 at 1:40
1

This isn't exactly one of functional programming's best use cases, but most languages have a construct that will create an infinite stream of lines, which you generally feed into a fold with an initial state. Here's an example in Scala (sorry, I don't know F#):

val lines = io.Source.stdin.getLines
val state = lines.foldLeft(initialState)(processLine)

Basically, the processLine function is whatever you would put in your while loop, and it returns a new state each time through. Inside it, you can use split, regexes, pattern matching, or even parser combinators if you're brave.

I can't claim the functional solution ends up more elegant with something this stateful, but it does have the advantage that the wrong path is usually obvious pretty quickly.

answered Oct 26, 2015 at 15:45

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.