13

Most of languages, using garbage collectors (possibly all of them), have one major issue related to parallel computations: garbage collector has to stop all running threads in order to delete unused objects. Haskell has garbage collector too. But, due to purity, Haskell guarantees that no computation changes the original data, it produces a copy instead, and then makes the changes. I suppose, in that way, GC does not has to stop all threads to do its job. I'm just curious: does Haskell have the same issue with garbage collection or not?

Erik Kaplun
38.5k15 gold badges102 silver badges113 bronze badges
asked Mar 3, 2012 at 16:36
2
  • 4
    Haskell has mutable data. It's "the world's finest imperative programming language" after all. It's simply not as endorsed and widely used. Also, the runtime system may also require synchronization for the GC to run, and that part is most certainly impure even in Haskell. Commented Mar 3, 2012 at 16:42
  • 5
    For Haskell there is in fact normally quite a lot of updates because of lazy evaluation, i.e., when a thunk is overwritten with the value it represents. Commented Mar 3, 2012 at 16:50

2 Answers 2

19

GHC's garbage collector is parallel but not concurrent. That means that it can use all threads to perform garbage collection, but it has to stop all threads to do that. Concurrent garbage collection is much harder to implement (and usually has a larger performance overhead).

It is somewhat ironic that Haskell does indeed use lots of mutable objects, namely thunks (unevaluated expressions). Mutable objects cannot freely be duplicated (and even for immutable objects too much duplication should be kept in check).

For programs running on multiple cores having a truly concurrent collector would be nice, but you can also get decent benefits by doing heap-local garbage collection. The idea is that data that is not shared between multiple CPUs can be collected only by the owning CPU. This is typically the case for short-lived data. The Simons have done some recent work in this area. See their paper "Multicore Garbage Collection with Local Heaps" (PDF). This paper also shows some ways how you can exploit immutability in a way similar to what you propose.

Edit: I forgot to mention that Erlang basically does exactly what you propose. Each Erlang process has its own heap and sending a message copies the data from one process to the other. For this reason each Erlang process can do its own GC independently of all the other processes. (The downside is that Erlang doesn't give you shared memory concurrency.)

answered Mar 3, 2012 at 17:05
Sign up to request clarification or add additional context in comments.

1 Comment

nominolo, thanks for good exhaustive answer. Link to Simons' research is valuable too, I'll read it shortly.
2

Seval GC implementations may run in parallel without "stopping the world" as you suggest (I think latest Oracle JVM don't stop the world and is multi-threaded, for an example; and most JVM are not "stop-the-world").

Recall that GC implementations may vary widely from one language implementation to another one.

There is considerable litterature about GC, and still many research papers on parallel garbage collection.

Start with a good book like the GC handbook (by Richard Jones, Antony Hosking, Eliot Moss). Or at least wikipedia Garbage Collection page.

Purely functional languages like Haskell rely heavily on a very performant GC. Other languages have different constraints (for instance, write barriers matters less with Haskell than with Java, but Haskell programs probably allocate more than Java).

For parallel GC, the devil is very much in the details.

answered Mar 3, 2012 at 16:46

1 Comment

Please not that in GC research (and some other areas) the terms "concurrent" and "parallel" are not interchangeable. Concurrent means that GC can happen while the user threads are running; parallel means multiple GC threads do work at the same time, but no user threads may run at the same time. I don't think the Hotspot VM has a truly concurrent GC. It has various mostly-concurrent GCs that do some GC work concurrently (scanning), and then have a shorter stop-the-world phase (which may be parallel).

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.