I've recently started Cracking the Code Interview and I just got to LinkedLists. I did problem 2.1 to remove duplicate entries from the book's implementation of LinkedList.
When I time the removal of dupes, I see the book's implementation is faster than Java LinkedList.
I've implemented four dupe removal methods, each with a different parameter. The book's LinkedList implementation is referred to as Node.
static void removeDupes(LinkedList<Integer> list) {
HashSet<Integer> integerHashSet = new HashSet<>();
for (int i = 0; i < list.size(); i++) {
if (integerHashSet.contains(list.get(i))) {
list.remove(list.get(i));
} else {
integerHashSet.add(list.get(i));
}
}
}
static void removeDupes(ArrayList<Integer> list) {
HashSet<Integer> integerHashSet = new HashSet<>();
for (int i = 0; i < list.size(); i++) {
if (integerHashSet.contains(list.get(i))) {
list.remove(list.get(i));
} else {
integerHashSet.add(list.get(i));
}
}
}
static void removeDupes(Node currentNode) {
HashSet<Integer> integerHashSet = new HashSet<>();
while(currentNode != null) {
if (integerHashSet.contains(currentNode.data)) {
currentNode.prev.next = currentNode.next;
if (currentNode.next != null) {
currentNode.next.prev = currentNode.prev;
}
} else {
integerHashSet.add(currentNode.data);
}
currentNode = currentNode.next;
}
}
static void removeDupesNoBuffer(Node currentNode) {
while (currentNode != null) {
Node runnerNode = currentNode;
while(runnerNode.next != null) {
if (runnerNode.next.data == currentNode.data) {
runnerNode.next = runnerNode.next.next;
if (runnerNode.next != null) {
runnerNode.next.prev = runnerNode;
}
} else {
runnerNode = runnerNode.next;
}
}
currentNode = currentNode.next;
}
}
Book's LinkedList implementation:
public class Node {
Node prev;
Node next;
int data;
Node(int d) {
data = d;
}
void add(int d) {
Node newNode = new Node(d);
Node currentNode = this;
while (currentNode.next != null) {
currentNode = currentNode.next;
}
currentNode.next = newNode;
newNode.prev = currentNode;
}
}
Each List or Node, I populated with a length of 100000, where each odd index was 0 and everything else was unique.
My results were:
- Node: 18ms
- Node No Buffer: 5154ms
- LinkedList: 2734ms
- ArrayList: 1392ms
What am I not understanding?
EDIT:
When I swapped the remove dupes with the LinkedList parameter to an enhanced for loop as pointed out by the comments and solution, it became just as fast.
static void removeDupes(LinkedList<Integer> list) {
HashSet<Integer> integerHashSet = new HashSet<>();
for(Integer i : list) {
if (integerHashSet.contains(i)) {
list.remove(i);
} else {
integerHashSet.add(i);
}
}
}
1 Answer 1
The reason is that your doing it inefficently. If you use indices to iterate over a linked list the list has to count each node reference to find the correct one. And then it acts on that one.
Try it like this and see the difference. Counting even values of a linked list of N items. The second iteration will be much faster.
First, a indexed for loop.
List<Integer> list1 = new LinkedList<>();
Random r = new Random();
int N = 100_000;
for(int i = 0; i < N; i++) {
list1.add(r.nextInt(10000));
}
// copy the list
List<Integer> list2 = new LinkedList<>(list1);
System.out.println("Starting list1");
int sumEvens = 0;
for(int i = 0; i < list1.size(); i++) {
if (list1.get(i) % 2 == 0) {
sumEvens++;
}
}
System.out.printf("There were %d even values%n", sumEvens);
Now an enhanced forloop which uses an iterator.
System.out.println("Starting list2");
sumEvens = 0;
for(int i : list2) {
if ( i%2 == 0) {
sumEvens++;
}
}
System.out.printf("There were %d even values%n", sumEvens);
Finally, Arraylists are fast at accessing but slow at deleting. This is because when an item is deleted, all subsequent items must be copied up one cell to close the gap. But a linked list can delete an item by simply having the previous node point to the following node. So how a list is to be used drives the selection of the list implementation.
4 Comments
Iterator<Integer> it = list.iterator(); Check out the Iterator interface in the Java Doc. This is important if you want to delete items as you iterate across them. If you use the enhanced for loop you will get a ConcurrentModificationException. Examples of both are covered on this site. All you got to do is search.ConcurrentModificationException on the ArrayList. Also, the enhanced for loop on the LinkedList didn't throw the exception either.list2.remove(Integer.valueOf(i)) in my second example. Or for(Integer i : list2) Remember, there are two versions of remove. One for indices and one for objects.
list.get(i)is O(N). Always use node.next instead or the linked list iterator