I decided to write a singly-linked list, and had the plan going in to make the internal linked node structure immutable.
I ran into a snag though. Say I have the following linked nodes (from previous add
operations):
1 -> 2 -> 3 -> 4
and say I want to append a 5
.
To do this, since node 4
is immutable, I need to create a new copy of 4
, but replace its next
field with a new node containing a 5
. The problem is now, 3
is referencing the old 4
; the one without the appended 5
. Now I need to copy 3
, and replace its next
field to reference the 4
copy, but now 2
is referencing the old 3
...
Or in other words, to do an append, the entire list seems to need to be copied.
My questions:
Is my thinking correct? Is there any way to do an append without copying the entire structure?
Apparently "Effective Java" contains the reccomendation:
Classes should be immutable unless there's a very good reason to make them mutable...
Is this a good case for mutability?
I don't think this is a duplicate of the suggested answer since I'm not talking about the list itself; that obviously has to be mutable to conform to the interface (without doing something like keeping the new list internally and retrieving it via a getter. On second thought though, even that would require some mutation; it would just be kept to a minimum). I'm talking about whether or not the internals of the list must be immutable.
7 Answers 7
With lists in functional languages, you nearly always work with a head and tail, the first element and the remainder of the list. Prepending is much more common because, as you surmised, appending requires copying the entire list (or other lazy data structures that don't precisely resemble a linked list).
In imperative languages, appending is much more common, because it tends to feel more natural semantically, and you don't care about invalidating references to previous versions of the list.
As an example of why prepending doesn't require copying the entire list, consider you have:
2 -> 3 -> 4
Prepending a 1
gives you:
1 -> 2 -> 3 -> 4
But note that it doesn't matter if someone else is still holding a reference to 2
as the head of their list, because the list is immutable and the links only go one way. There's no way to tell the 1
is even there if you only have a reference to 2
. Now, if you appended a 5
onto either list, you'd have to make a copy of the entire list, because otherwise it would appear on the other list as well.
-
Ahh, good point on why prepending doesn't require a copy; I didn't think of that.Carcigenicate– Carcigenicate2015年08月12日 15:20:09 +00:00Commented Aug 12, 2015 at 15:20
You are correct, appending requires copying the whole list if you are unwilling to mutate any nodes in-place. For we need to set the next
pointer of the (now) second-to-last node, which in an immutable setting creates a new node, and then we need to set the next
pointer of the third-to-last node, and so on.
I think the core problem here is not immutability, nor is the append
operation ill-advised. Both are perfectly fine in their respective domains. Mixing them is bad: The natural (efficient) interface for an immutable list emphasized manipulation at the front of the list, but for mutable lists it is often more natural to build the list by successively appending the items from first to last.
Therefore I suggest that you make up your mind: Do you want an ephemeral interface or a persistent one? Do most operations produce a new list and leave an unmodified version accessible (persistent), or do you write code like this (ephemeral):
list.append(x);
list.prepend(y);
Both choices are fine, but the implementation should mirror the interface: A persistent data structure benefits from immutable nodes, while an ephemeral one needs internal mutability to actually fulfill the performance promises it implicitly makes. java.util.List
and other interfaces are ephemeral: Implementing them on an immutable list is inappropriate and in fact a performance hazard. Good algorithms on mutable data structures are often quite different from good algorithms on immutable data structures, so dressing up an immutable data structure as a mutable one (or vice versa) invites bad algorithms.
While a persistent list has some disadvantages (no efficient appending), this need not be a serious problem when programming functionally: Many algorithms can be formulated efficiently by shifting the mindset and using higher-order functions such as map
or fold
(to name two relatively primitive ones), or by prepending repeatedly. Furthermore, nobody is forcing you to use only this data structure: When others (ephemeral, or persistent but more sophisticated) are more appropriate, use them. I should also note that persistent lists have some advantages for other workloads: They share their tails, which can conserve memory.
You must watch this great video from 2015 React conf by Immutable.js's creator Lee Byron. It will give you pointers and structures to understand how to implement an efficient immutable list which do not duplicate content. The basics ideas are : - as long as two lists use identical nodes (same value, same next node), the same node is used - when the lists start differing, a structure is created at the node of divergence, which holds pointers to the next particular node of each list
This image from a react tutorial may be clearer than my broken English :enter image description here
If you have a singly linked list you will be working with the front if it more than you would with the back.
Functional languages like prolog and haskel provide easy ways to get the front element and the rest of the array. Appending to the back is a O(n) operation with copies each node.
-
1I've used Haskell. Afaik, it also circumvents part of the problem by using laziness. I'm doing appends because I thought that's what expected by the
List
interface (I could be wrong there though). I don't think that bit really answers the question either though. The entire list would still need to be copied; it would just make access to the last element added faster since a full traversal wouldn't be required.Carcigenicate– Carcigenicate2015年08月12日 14:56:04 +00:00Commented Aug 12, 2015 at 14:56 -
Creating a class according to an interface that doesn't match the underlying data structure/algorithm is a invitation to pain and inefficiencies.ratchet freak– ratchet freak2015年08月12日 14:59:42 +00:00Commented Aug 12, 2015 at 14:59
-
The JavaDocs explicitly use the word "append". You're saying it's better to ignore that for the sake of implementation?Carcigenicate– Carcigenicate2015年08月12日 15:01:37 +00:00Commented Aug 12, 2015 at 15:01
-
2@Carcigenicate no I'm saying that it's a mistake to try and fit a singly linked list with immutable nodes into the mold of
java.util.List
ratchet freak– ratchet freak2015年08月12日 15:02:50 +00:00Commented Aug 12, 2015 at 15:02 -
1With laziness there would not be a linked list anymore; there is no list, hence there is no mutation (append). Instead there is some code that generates the next item upon request. But it cannot be used everywhere where a list is used. Namely, if you write a method in Java that expects to mutate a list that is passed in as an argument, that list has to be mutable in the first place. The generator approach requires a complete restructuring (reorganization) of code to make it work - and to make it possible to do away with the list physically.rwong– rwong2015年08月12日 15:03:10 +00:00Commented Aug 12, 2015 at 15:03
As others have pointed out, you are correct that an immutable singly-linked list requires copying the entire list when performing an append operations.
Often you can use the workaround of implementing your algorithm in terms of cons
(prepend) operations, and then reverse the final list once. This still needs to copy the list once, but the complexity overhead is linear in the length of the list, whereas by repeatedly using append you can easily get a quadratic complexity.
Difference lists (see e.g. here) are an interesting alternative. A difference list wraps a list and provides an append operation in constant time. You basically work with the wrapper as long as you need to append, and then convert back to a list when you are finished. This is somehow similar to what you do when you use a StringBuilder
to construct a string, and at the end get the result as a String
(immutable!) by calling toString
. One difference is that a StringBuilder
is mutable, but a difference list is immutable. Also, when you convert a difference list back to a list, you still have to construct the entire new list but, again, you only need to do this once.
It should be quite easy to implement an immutable DList
class that provides a similar interface as Haskell's Data.DList
.
It's not strictly Java, but I suggest you read this article on an immutable, but performant indexed persistent data structure written in Scala:
http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala
Since it's a Scala data structure, it can be used from Java too (with a bit more verbosity). It's based on a data structure available in Clojure, and I'm sure there's a more "native" Java library offering it as well.
Also, a note on constructing immutable data structures: what you usually want is some sort of "Builder", which allows you to "mutate" your "under-construction" data structure by appending elements to it (within a single thread); after you're done appending, you call a method on the "under-construction" object such as .build()
or .result()
, which "builds" the object, giving you an immutable data structure that you can then safely share.
-
1There's a question about Java ports of PersistentVector at stackoverflow.com/questions/15997996/…James_pic– James_pic2015年08月13日 10:03:11 +00:00Commented Aug 13, 2015 at 10:03
An approach which can sometimes be helpful would be to have two classes of list-holding objects--a forward-list object with a final
reference to the first node in a forward-linked list and an initially-null non-final
reference which (when non-null) identifies a reverse-list object holding the same items in the opposite order, and a reverse-list object with a final
reference to the last item of a reverse-linked list and an initially-null non-final reference which (when non-null) identifies a forward-list object holding the same items in opposite order.
Prepending an item to a forward-list or appending an item to a reverse-list would merely require creating a new node which links to the node identified by the final
reference and creating a new list object of the same type as the original, with a final
reference to that new node.
Appending an item to a forward-list or prepending to a reverse-list would require having a list of the opposite type; the first time that is done with a particular list it should create a new object of the opposite type and store a reference; repeating the action should reuse the list.
Note that the external state of a list object will be considered the same whether its reference to an opposite-type list is null or identifies an opposite-order list. There would be no need to make any variables final
, even when using multi-threaded code, since every list object would have a final
reference to a complete copy of its contents.
If code on one thread creates and cache a reversed copy of a list, and the field in which that copy is cached isn't volatile, it would be possible that code on another thread might not see the cached list, but the only adverse effect from that would be that the other thread would need to do extra work building another reversed copy of the list. Because such action would at worst impair efficiency, but would not affect correctness, and because volatile
variables create efficiency impairments of their own, it will often be better to have the variable be non-volatile and accept the possibility of occasional redundant operations.
CopyOnWritexxx
classes used for multi-threading. Nobody expects collections to be immutable really (although it does create some quirks)