I have some experience writing small tools in Haskell and I find it very intuitive to use, especially for writing filters (using interact
) that process their standard input and pipe it to standard output.
Recently I tried to use one such filter on a file that was about 10 times larger than usual and I got a Stack space overflow
error.
After doing some reading (e.g. here and here) I have identified two guidelines to save stack space (experienced Haskellers, please correct me if I write something that is not correct):
- Avoid recursive function calls that are not tail-recursive (this is valid for all functional languages that support tail-call optimization).
- Introduce
seq
to force early evaluation of sub-expressions so that expressions do not grow too large before they are reduced (this is specific to Haskell, or at least to languages using lazy evaluation).
After introducing five or six seq
calls in my code my tool runs smoothly again (also on the larger data). However, I find the original code was a bit more readable.
Since I am not an experienced Haskell programmer I wanted to ask if introducing seq
in this way is a common practice, and how often one will normally see seq
in Haskell production code. Or are there any techniques that allow to avoid using seq
too often and still use little stack space?
-
1Optimizations like the kind you described are almost always going to make the code a bit less elegant.Robert Harvey– Robert Harvey2012年10月06日 16:25:24 +00:00Commented Oct 6, 2012 at 16:25
-
@Robert Harvey: Are there any alternative techniques to keep stack usage low? I mean I imagine I have to rewrite my functions differently but I have no clue whether there are well-established techniques. My first attempt was to use tail-recursive functions, which helped but did not allow me to solve my problem completely.Giorgio– Giorgio2012年10月06日 19:17:31 +00:00Commented Oct 6, 2012 at 19:17
1 Answer 1
Unfortunately there are cases when one has to use seq
in order to get a efficient/well working program for large data. So in many cases, you cannot do without it in production code. You can find more information in Real World Haskell, Chapter 25. Profiling and optimization.
However, there are possibilities how to avoid using seq
directly. This can make code cleaner and more robust. Some ideas:
- Use conduit, pipes or iteratees instead of
interact
. Lazy IO is known to have problems with managing resources (not just memory) and iteratees are designed exactly to overcome this. (I'd suggest to avoid lazy IO alltogether no matter how large your data is - see The problem with lazy I/O.) - Instead of using
seq
directly use (or design your own) combinators such as foldl' or foldr' or strict versions of libraries (such as Data.Map.Strict or Control.Monad.State.Strict) that are designed for strict computations. - Use BangPatterns extension. It allows to replace
seq
with strict pattern matching. Declaring strict constructor fields could be also useful in some cases. - It's also possible to use Strategies for forcing evaluation. Strategies library is mostly aimed at parallel computations, but has methods for forcing a value to WHNF (
rseq
) or full NF (rdeepseq
) as well. There are many utility methods for working with collections, combining strategies etc.
-
+1: Thanks for the useful hints and links. Point 3 seems quite interesting (and the easiest solution for me to use right now). Regarding suggestion 1, I do not see how avoiding lazy IO can improve things: As far as I understand lazy IO should be better for a filter that is supposed to process a (possibly very long) stream of data.Giorgio– Giorgio2012年10月15日 21:05:11 +00:00Commented Oct 15, 2012 at 21:05
-
2@Giorgio I added a link to Haskell Wiki about problems with Lazy IO. With lazy IO you can have very hard time managing resources. For example, if you don't fully read the input (like due to lazy evaluation), the file handle remains open. And if you go and close the file handle manually, it often happens that due to lazy evaluation reading it is postponed and you close the handle before reading the whole input. And, it's often quite hard to avoid memory problems with lazy IO.Petr– Petr2012年10月16日 05:07:52 +00:00Commented Oct 16, 2012 at 5:07
-
I recently had this problem and my program was running out of file descriptors. So I replaced lazy IO with strict IO using strict
ByteString
.Giorgio– Giorgio2015年01月10日 09:03:33 +00:00Commented Jan 10, 2015 at 9:03
Explore related questions
See similar questions with these tags.