I've become a bit rusty in Java. This is an attempt at brushing up my skills.
In computer science, merge sort is an efficient, general-purpose, comparison-based sorting algorithm.
Questions
- How can I improve my test cases? (TestNG)
- I've used Java-11 here, can I use better idioms?
- Can this implementation improved for performance?
- Can style / readability improved?
Code
Sorting.java
import java.util.Arrays;
import java.util.Comparator;
import java.util.Objects;
public class Sorting {
/**
* Day 08 - Merge Sort
*
* @param data data array
* @param comparator comparator of given data type
* @param ascending sort order
* @param <T> data type
* @return sorted array (in place)
*/
public static <T> T[] mergeSort(T[] data, Comparator<T> comparator, boolean ascending) {
Objects.requireNonNull(data);
Objects.requireNonNull(comparator);
if (data.length <= 1) return data;
var actualComparator = ascending ? comparator : comparator.reversed();
return mergeSort(data, actualComparator, 0, data.length - 1);
}
private static <T> T[] mergeSort(T[] data, Comparator<T> comparator, int start, int stop) {
if (start >= stop) return data;
int pivot = start + (stop - start) / 2;
mergeSort(data, comparator, start, pivot);
mergeSort(data, comparator, pivot + 1, stop);
return merge(data, comparator, start, pivot, stop);
}
private static <T> T[] merge(T[] data, Comparator<T> comparator, int start, int pivot, int stop) {
T[] left = Arrays.copyOfRange(data, start, pivot + 1);
T[] right = Arrays.copyOfRange(data, pivot + 1, stop + 1);
int lPos = 0;
int rPos = 0;
int pos = start;
while (lPos < left.length && rPos < right.length) {
if (comparator.compare(left[lPos], right[rPos]) <= 0) {
data[pos++] = left[lPos++];
} else {
data[pos++] = right[rPos++];
}
}
while (lPos < left.length) {
data[pos++] = left[lPos++];
}
while (rPos < right.length) {
data[pos++] = right[rPos++];
}
return data;
}
}
TestMergeSorting.java
import org.testng.Assert;
import org.testng.annotations.Test;
public class TestMergeSorting {
@Test
public void testMergeSort() {
Assert.assertEquals(asc(6, 5, 9, 3, 2, 1), ar(1, 2, 3, 5, 6, 9));
Assert.assertEquals(asc(1, 1, 1, 1, 2, 1), ar(1, 1, 1, 1, 1, 2));
Assert.assertEquals(asc(2, 1, 1, 1, 1, 2, 1), ar(1, 1, 1, 1, 1, 2, 2));
Assert.assertEquals(asc(1, 1, 1, 1, 1, 2, 1), ar(1, 1, 1, 1, 1, 1, 2));
Assert.assertEquals(asc(3, 2, 1), ar(1, 2, 3));
Assert.assertEquals(asc(1, 2, 3), ar(1, 2, 3));
Assert.assertEquals(asc(2, 0, -1, 2, 1), ar(-1, 0, 1, 2, 2));
Assert.assertEquals(dsc(1, 2, 3), ar(3, 2, 1));
Assert.assertEquals(dsc(3, 2, 1), ar(3, 2, 1));
Assert.assertEquals(dsc(2, 0, -1, 2, 1), ar(2, 2, 1, 0, -1));
Assert.assertEquals(asc(2, 1), ar(1, 2));
Assert.assertEquals(asc(1), ar(1));
Assert.assertEquals(asc(), ar());
Assert.assertEquals(dsc(1), ar(1));
Assert.assertEquals(dsc(), ar());
}
@Test(expectedExceptions = {NullPointerException.class})
public void testNullExcept() {
Sorting.mergeSort(null, Integer::compare, true);
}
private static Integer[] ar(Integer... objects) {
return objects;
}
private static Integer[] asc(Integer... objects) {
return Sorting.mergeSort(objects, Integer::compare, true);
}
private static Integer[] dsc(Integer... objects) {
return Sorting.mergeSort(objects, Integer::compare, false);
}
}
3 Answers 3
Naming
Prefer to use conventional names. That is, the same names used by the
Arrays
class in the Java standard library:
data => a
comparator => c
start => fromIndex
stop => toIndex
I've never seen the center element in merge sort being referred to as
the pivot
. I think a more fitting name is mid
.
Inclusive start, exclusive end
It is better to use inclusive start indices and exclusive stop
indices, matching the convention established by
Arrays.sort
and many other array functions in the Java standard library.
This way, if you make your other mergeSort
overload public:
public static <T> void mergeSort(T[] a,
int fromIndex, int toIndex,
Comparator<T> c) {
It becomes very easy for people to use it because the signature
matches the Arrays.sort(T[], int, int, Comparator)
function.
Garbage generation
The two calls to Arrays.copyOfRange
in merge
creates temporary
arrays which causes unnecessary garbage collection pressure. It is
more efficient to preallocate a temporary buffer and then pass that
one around.
I've rewritten your merge
function to use such a temporary
buffer. It makes the code a little more complex, but I think it is
worth it.
Modified code with comments removed:
import java.util.Arrays;
import java.util.Comparator;
import java.util.Objects;
public class Sorting {
public static <T> void mergeSort(T[] a,
Comparator<? super T> c,
boolean ascending) {
Comparator<? super T> actualComparator = ascending ? c : c.reversed();
mergeSort(a, 0, a.length, actualComparator);
}
public static <T> void mergeSort(T[] a,
int fromIndex, int toIndex,
Comparator<? super T> c) {
Objects.requireNonNull(a);
Objects.requireNonNull(c);
if (fromIndex > toIndex) {
throw new IllegalArgumentException(
"fromIndex(" + fromIndex + ") > toIndex(" + toIndex + ")");
}
if (fromIndex < 0) {
throw new ArrayIndexOutOfBoundsException(fromIndex);
}
if (toIndex > a.length) {
throw new ArrayIndexOutOfBoundsException(toIndex);
}
T[] buf = Arrays.copyOf(a, a.length);
mergeSortInternal(a, fromIndex, toIndex, c, buf);
}
private static <T> void mergeSortInternal(T[] a,
int fromIndex, int toIndex,
Comparator<? super T> c,
T[] buf) {
if (toIndex - fromIndex <= 1)
return;
int mid = fromIndex + (toIndex - fromIndex) / 2;
mergeSortInternal(a, fromIndex, mid, c, buf);
mergeSortInternal(a, mid, toIndex, c, buf);
merge(a, fromIndex, mid, toIndex, c, buf);
}
private static <T> void merge(T[] a,
int fromIndex, int mid, int toIndex,
Comparator<? super T> c,
T[] buf) {
System.arraycopy(a, fromIndex, buf, fromIndex, toIndex - fromIndex);
int lPos = fromIndex;
int rPos = mid;
int lEnd = mid;
int rEnd = toIndex;
int pos = fromIndex;
while (lPos < lEnd && rPos < rEnd) {
if (c.compare(buf[lPos], buf[rPos]) <= 0) {
a[pos++] = buf[lPos++];
} else {
a[pos++] = buf[rPos++];
}
}
while (lPos < lEnd) {
a[pos++] = buf[lPos++];
}
while (rPos < rEnd) {
a[pos++] = buf[rPos++];
}
}
}
-
\$\begingroup\$ Are you really proposing to rename the perfectly fine
comparator
to the meaninglessc
because in one implementation of the standard library they did the same? I assume he should also introduce the age-old overflow error they had for over a decade in the standard library as well? \$\endgroup\$Voo– Voo2019年12月28日 20:05:58 +00:00Commented Dec 28, 2019 at 20:05 -
\$\begingroup\$ Yes. One letter names are fine if it is clear what they mean. \$\endgroup\$Gaslight Deceive Subvert– Gaslight Deceive Subvert2019年12月28日 21:17:39 +00:00Commented Dec 28, 2019 at 21:17
-
\$\begingroup\$ I like your answer but I don't like
a
andc
:) \$\endgroup\$JaDogg– JaDogg2019年12月28日 22:26:36 +00:00Commented Dec 28, 2019 at 22:26 -
\$\begingroup\$ @Björn So what does
a
stand for then? \$\endgroup\$Voo– Voo2019年12月28日 23:01:18 +00:00Commented Dec 28, 2019 at 23:01 -
\$\begingroup\$ a is the array to merge sort. \$\endgroup\$Gaslight Deceive Subvert– Gaslight Deceive Subvert2019年12月28日 23:18:57 +00:00Commented Dec 28, 2019 at 23:18
One thing I noticed, the sorting routine is modifying the array and returning the array. I would suggest either make the functions void
or create a new array to do the merge in. A sort function with a void
return type automatically tells the user that the sort is in place. Where as, a return type of an array, tells the user that a new array is created, but that the original array isn't touched.
- If you are going to provide more than one variant of the method, then I recommend that you provide every possible variant. The indices are there or not (and defaults to the whole array). The comparator is optional (and defaults to the natural comparator). The ascending flag is optional (and defaults to true). Eight variants, or 16 if you want copying and in-place versions.
- I believe the "right" array is unneeded. The original data is can't be overwritten.
- It is worth remembering that below some size, bubble sort (or variations thereof) can be the fastest sort! (This may not matter as an exercise in brushing up skills, but it you want to actually use it...)
-
\$\begingroup\$ I think newer versions of Java comes with timsort which is better. :) This is only an exercise anyway 🥳 \$\endgroup\$JaDogg– JaDogg2019年12月28日 16:05:10 +00:00Commented Dec 28, 2019 at 16:05