Document hash() limitations in BucketSort; add tests showing distribution behavior #6511

Original file line number	Diff line number	Diff line change
Expand Up		@@ -95,6 +95,22 @@ private <T extends Comparable<T>> T[] concatenateBuckets(Iterable<List<T>> bucke
		* This is done by "normalizing" the element within the range of the array's minimum (min) and maximum (max) values,
		* and then mapping this normalized value to a specific bucket index.
		*
	*<p><b>Important limitations:</b>
	*<ul>
	* <li>This method uses {@code compareTo} as if it provided a numeric difference.
	* For numeric types, {@code compareTo} only reports order (−1, 0, 1), not the actual distance.
	* This often collapses distribution into one or two buckets.</li>
	* <li>For non-numeric {@code Comparable} types (for example {@code String}), bucket indices depend on lexicographic
	* code-point differences, which are not a proportional measure of spacing. Distribution is therefore arbitrary and uneven.</li>
	* <li>If {@code min.equals(max)}, the computed "range" is 0. Then {@code element.compareTo(min) / 0}
	* yields {@code NaN}, which Java coerces to 0 when cast to {@code int}.
	* Practically, all elements collapse into bucket 0 in this case.</li>
	* </ul>
	*
	* <p>Despite these limitations, the sort remains correct because each bucket is sorted internally and concatenated.
	* This method should be regarded as a simplified demonstration rather than a
	* general-purpose bucketing strategy for arbitrary {@code Comparable<T>} values.</p>
	*
		* @param element the element of the array
		* @param min the minimum value in the array
		* @param max the maximum value in the array
Expand Down

72 changes: 72 additions & 0 deletions src/test/java/com/thealgorithms/sorts/BucketSortHashBehaviorTest.java

Show comments View file Open in desktop

Copy link

Collaborator

@DenizAltunkapan DenizAltunkapan Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason why you didn't put your Tests in the exististing BucketSortTest ? @Leogricci

Copy link

Author

@Leogricci Leogricci Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the issue only related to the particular case of the hash function i thought it might be better to create another test class, but I realize now it just makes it harder to find. Sorry for the inconvenience, I will modify it now.

Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,72 @@
	package com.thealgorithms.sorts;

	import static org.junit.jupiter.api.Assertions.assertArrayEquals;

	import java.util.Arrays;
	import org.junit.jupiter.api.Test;

	public class BucketSortHashBehaviorTest {

	private static <T extends Comparable<T>> int pseudoHash(final T element, final T min, final T max, final int numberOfBuckets) {
	// Reproduces the production hash() logic
	double range = max.compareTo(min);
	double normalizedValue = element.compareTo(min) / range; // -1/0/1 divided by -1/0/1
	return (int) (normalizedValue * (numberOfBuckets - 1));
	}

	@Test // Test case when all numbers are equal
	void sortStillCorrectWhenAllEqual() {
	Integer[] arr = {1, 1, 1, 1, 1};
	Integer[] expected = arr.clone();

	new BucketSort().sort(arr);
	assertArrayEquals(expected, arr);

	// Observe bucket mapping (all collapse to index 0)
	Integer min = 1;
	Integer max = 1;
	int numberOfBuckets = Math.max(arr.length / 10, 1); // same as BUCKET_DIVISOR rule
	int idx = pseudoHash(1, min, max, numberOfBuckets);
	// idx will be 0 because NaN cast to int -> 0 in Java
	System.out.println("All-equal case -> bucket index: " + idx);
	}

	@Test // Test case with non-equal integers
	void sortStillCorrectNonEqualIntegers() {
	Integer[] arr = {20, 40, 30, 10};
	Integer[] expected = {10, 20, 30, 40};

	new BucketSort().sort(arr);
	assertArrayEquals(expected, arr);

	Integer min = Arrays.stream(arr).min(Integer::compareTo).get();
	Integer max = Arrays.stream(arr).max(Integer::compareTo).get();
	int numberOfBuckets = Math.max(arr.length / 10, 1); // often 1 here; bump to 4 to demonstrate
	numberOfBuckets = 4;

	for (Integer x : arr) {
	int idx = pseudoHash(x, min, max, numberOfBuckets);
	System.out.println("Value " + x + " -> bucket " + idx);
	}
	// Expect only two distinct buckets because compareTo gives -1/0/1
	}

	@Test // Test case when the Array contains Strings
	void sortStillCorrectWhenStrings() {
	String[] arr = {"apple", "banana", "carrot"};
	String[] expected = arr.clone();

	new BucketSort().sort(arr);
	assertArrayEquals(expected, arr);

	String min = Arrays.stream(arr).min(String::compareTo).get();
	String max = Arrays.stream(arr).max(String::compareTo).get();
	int numberOfBuckets = 4;

	for (String s : arr) {
	int idx = pseudoHash(s, min, max, numberOfBuckets);
	System.out.println("Value \"" + s + "\" -> bucket " + idx);
	}
	// Buckets reflect only lexicographic order, not a numeric spacing
	}
	}

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Document hash() limitations in BucketSort; add tests showing distribution behavior #6511

Are you sure you want to change the base?

Document hash() limitations in BucketSort; add tests showing distribution behavior #6511

Filter by extension

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

@DenizAltunkapan DenizAltunkapan Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

@Leogricci Leogricci Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!