Most of languages, using garbage collectors (possibly all of them), have one major issue related to parallel computations: garbage collector has to stop all running threads in order to delete unused objects. Haskell has garbage collector too. But, due to purity, Haskell guarantees that no computation changes the original data, it produces a copy instead, and then makes the changes. I suppose, in that way, GC does not has to stop all threads to do its job. I'm just curious: does Haskell have the same issue with garbage collection or not?
-
4Haskell has mutable data. It's "the world's finest imperative programming language" after all. It's simply not as endorsed and widely used. Also, the runtime system may also require synchronization for the GC to run, and that part is most certainly impure even in Haskell.user395760– user3957602012年03月03日 16:42:12 +00:00Commented Mar 3, 2012 at 16:42
-
5For Haskell there is in fact normally quite a lot of updates because of lazy evaluation, i.e., when a thunk is overwritten with the value it represents.augustss– augustss2012年03月03日 16:50:00 +00:00Commented Mar 3, 2012 at 16:50
2 Answers 2
GHC's garbage collector is parallel but not concurrent. That means that it can use all threads to perform garbage collection, but it has to stop all threads to do that. Concurrent garbage collection is much harder to implement (and usually has a larger performance overhead).
It is somewhat ironic that Haskell does indeed use lots of mutable objects, namely thunks (unevaluated expressions). Mutable objects cannot freely be duplicated (and even for immutable objects too much duplication should be kept in check).
For programs running on multiple cores having a truly concurrent collector would be nice, but you can also get decent benefits by doing heap-local garbage collection. The idea is that data that is not shared between multiple CPUs can be collected only by the owning CPU. This is typically the case for short-lived data. The Simons have done some recent work in this area. See their paper "Multicore Garbage Collection with Local Heaps" (PDF). This paper also shows some ways how you can exploit immutability in a way similar to what you propose.
Edit: I forgot to mention that Erlang basically does exactly what you propose. Each Erlang process has its own heap and sending a message copies the data from one process to the other. For this reason each Erlang process can do its own GC independently of all the other processes. (The downside is that Erlang doesn't give you shared memory concurrency.)
1 Comment
Seval GC implementations may run in parallel without "stopping the world" as you suggest (I think latest Oracle JVM don't stop the world and is multi-threaded, for an example; and most JVM are not "stop-the-world").
Recall that GC implementations may vary widely from one language implementation to another one.
There is considerable litterature about GC, and still many research papers on parallel garbage collection.
Start with a good book like the GC handbook (by Richard Jones, Antony Hosking, Eliot Moss). Or at least wikipedia Garbage Collection page.
Purely functional languages like Haskell rely heavily on a very performant GC. Other languages have different constraints (for instance, write barriers matters less with Haskell than with Java, but Haskell programs probably allocate more than Java).
For parallel GC, the devil is very much in the details.
1 Comment
Explore related questions
See similar questions with these tags.