ArrayLists seem to be sorted with TimSort where the underlying list is not always consistent during sorting. It can happen that list entries disappear or appear twice upon calling the comparator.
In our comparator we are comparing keys for which we are using a function to get a value to compare for this key. As this function is used in other contexts we have a test whether the key is actually present in the list (something that wouldn't be necessary in the sorting):
if (keys.contains(itemId)) {
...
As keys is the list we are sorting it can happen in the comparator that a key is not found in the list due to the internal mechanics of TimSort.
Question: Is this mentioned somewhere in Javadoc (couldn't find it) that you are not supposed to access underlying list in Comparator? Is this a poor implementation of TimSort that should sort a copy? Or was it a stupid idea in the first place to access underlying list in comparator?
The program below, provided by T.J. Crowder, demonstrates that the contents of the underlying list may not be consistent during calls to the Comparator. (This program demonstrates the phenomenon in question, but it isn't representative of the actual application that's affected by the issue.)
import java.util.*;
public class Example {
private static String[] chars = {
"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"
};
private List<String> list;
private String[] entries;
private Example() {
this.entries = new String[1000];
for (int n = 0; n < 1000; ++n) {
this.entries[n] = chars[n % chars.length] + n;
}
// Ensure it's an ArrayList, specifically
this.list = new ArrayList<String>(Arrays.asList(this.entries));
}
public static void main(String[] args) {
(new Example()).run();
}
class ListComparator implements Comparator<String> {
public int compare(String a, String b) {
for (String s : entries) {
int i1 = Example.this.list.indexOf(s);
if (i1 == -1) {
System.out.println(s + ": Missing");
} else {
int i2 = Example.this.list.lastIndexOf(s);
if (i2 != i1) {
System.out.println(s + ": Duplicated, at " + i1 + " and " + i2);
}
}
}
return a.compareTo(b);
}
}
private void run() {
this.list.sort(new ListComparator());
}
}
Here are the first few lines of output from a run:
b1: Missing a52: Duplicated, at 2 and 32 b27: Missing a52: Duplicated, at 2 and 32 c2: Missing a52: Duplicated, at 2 and 32 c2: Missing c28: Missing a52: Duplicated, at 2 and 32 b53: Duplicated, at 5 and 33 c28: Missing d29: Missing a52: Duplicated, at 2 and 32 b53: Duplicated, at 5 and 33 d3: Missing d29: Missing a52: Duplicated, at 2 and 32 b53: Duplicated, at 5 and 33 d3: Missing d29: Missing e30: Missing
1 Answer 1
A bit of history here: in JDK 7, the TimSort algorithm replaced the previous "legacy merge sort" algorithm. In JDK 8, Collections.sort()
delegates to the new default method List.sort()
. This default method is overridden by ArrayList
, which does the sort in-place. The previous Collections.sort()
implementation would copy the list to a temporary array, perform the sort on that temporary array, and then copy the elements from the temporary array back to the original list.
If the sort comparator looks in the list being sorted, then its behavior will certainly be affected by the new in-place sorting behavior of ArrayList introduced in JDK 8. The change from the "legacy merge sort" to TimSort in JDK 7 probably has no impact in this case, since JDK 7 still did the sorting on a temporary copy.
The copy-sort-copyback behavior of List.sort()
is described in an "Implementation Requirements" section which specifies the behavior of the default method implementation, but which isn't part of the interface's contract that is imposed on all implementations. Thus, ArrayList
(and other subclasses) are free to change this behavior. I note that there is no documentation for the overriding implementation ArrayList.sort()
. I suppose it would be a small improvement if some documentation were added that specified the in-place sorting behavior.
If the in-place sorting of ArrayList
is a problem, you could copy the list before sorting it:
List<Key> list = ... ;
List<Key> newList = new ArrayList<>(list);
newList.sort(keyComparator); // uses the old list
list = newList;
Alternatively, perhaps you could give some more details about the way the comparator works, and we might be able to figure out a way to rewrite it so that it doesn't need to look in the list being sorted. (I'd suggest asking another question for this.)
-
Interesting points. The in-place sorting wouldn't be a problem if all elements remained in the list. As mentioned above one shouldn't of course change the order or the content of the list in the comparator but accessing the elements can be necessary if we are sorting keys rather than values. Once you know about the behavior it's easy to fix. As you suggested too I just copy the list before sorting.Thomas M– Thomas M2019年01月08日 06:34:16 +00:00Commented Jan 8, 2019 at 6:34
-
@ThomasM I’m curious; what does sorting keys instead of values entail? Also, the contains() method does a linear search so it would seem like it would make the comparator, and thus the sort, run extremely slowly. I guess I’m still confused about what this comparator does.Stuart Marks– Stuart Marks2019年01月08日 08:30:15 +00:00Commented Jan 8, 2019 at 8:30
-
We are sorting columns of a listbox so after the sorting the listbox is refreshed based on the newly ordered list of keys. The comparator is instantiated with the name of the column and the sort order. When comparing the compare function gets two keys and the comparator gets the values of the respective listbox column to compare. Does that make sense now?Thomas M– Thomas M2019年01月14日 07:41:06 +00:00Commented Jan 14, 2019 at 7:41
-
@ThomasM Thanks. It's starting to make sense what you're trying to do. But I'm still sort of guessing about why the comparator needs to look into the original list in order to perform the comparison. Is a key some kind of unique ID for an entry in the listbox? Isn't there some kind of row data structure that maintains the association between the key and the various column values?Stuart Marks– Stuart Marks2019年01月16日 01:37:42 +00:00Commented Jan 16, 2019 at 1:37
-
It is a generic viewing web solution for various archiving systems (IBM OnDemand, Oracle WCC, CSV files, generic JDBC databases, etc.). For all these systems there are so called adapters which are also responsible for holding the data. The core viewer does not know how the data is stored and is not making a local copy. When the user wants to view certain data he gets a dynamically created search form. The search is executed within the adapter which returns a list of opaque keys. Whenever the core needs data it uses these opaque keys to obtain the data from the adapter.Thomas M– Thomas M2019年01月16日 08:06:10 +00:00Commented Jan 16, 2019 at 8:06
keys.contains(itemId)
statement related to the ArrayList you are trying to sort?Comparator
will never delete elements in a List but if you explicitly remove them in the Comparator implementation. While it could be the case in aTreeSet
for objects with the same rank in terms of comparison. If you don't understand the behavior, please provide a minimal, short reproducible example.sort
algorithm may temporarily create an inconsistent list while running. Accessing the list during the sort (for instance, from a comparator) is not recommended as doing so may observe the inconsistent state, in the form of apparently-duplicated or -missing entries."