Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Document hash() limitations in BucketSort; add tests showing distribution behavior #6511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Leogricci wants to merge 8 commits into TheAlgorithms:master
base: master
Choose a base branch
Loading
from Leogricci:docs/hash-warning
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions src/main/java/com/thealgorithms/sorts/BucketSort.java
View file Open in desktop
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,22 @@ private <T extends Comparable<T>> T[] concatenateBuckets(Iterable<List<T>> bucke
* This is done by "normalizing" the element within the range of the array's minimum (min) and maximum (max) values,
* and then mapping this normalized value to a specific bucket index.
*
*<p><b>Important limitations:</b>
*<ul>
* <li>This method uses {@code compareTo} as if it provided a numeric difference.
* For numeric types, {@code compareTo} only reports order (−1, 0, 1), not the actual distance.
* This often collapses distribution into one or two buckets.</li>
* <li>For non-numeric {@code Comparable} types (for example {@code String}), bucket indices depend on lexicographic
* code-point differences, which are not a proportional measure of spacing. Distribution is therefore arbitrary and uneven.</li>
* <li>If {@code min.equals(max)}, the computed "range" is 0. Then {@code element.compareTo(min) / 0}
* yields {@code NaN}, which Java coerces to 0 when cast to {@code int}.
* Practically, all elements collapse into bucket 0 in this case.</li>
* </ul>
*
* <p>Despite these limitations, the sort remains correct because each bucket is sorted internally and concatenated.
* This method should be regarded as a simplified demonstration rather than a
* general-purpose bucketing strategy for arbitrary {@code Comparable<T>} values.</p>
*
* @param element the element of the array
* @param min the minimum value in the array
* @param max the maximum value in the array
Expand Down
View file Open in desktop
Copy link
Collaborator

@DenizAltunkapan DenizAltunkapan Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason why you didn't put your Tests in the exististing BucketSortTest ? @Leogricci

Copy link
Author

@Leogricci Leogricci Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the issue only related to the particular case of the hash function i thought it might be better to create another test class, but I realize now it just makes it harder to find. Sorry for the inconvenience, I will modify it now.

DenizAltunkapan reacted with thumbs up emoji
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
package com.thealgorithms.sorts;

import static org.junit.jupiter.api.Assertions.assertArrayEquals;

import java.util.Arrays;
import org.junit.jupiter.api.Test;

public class BucketSortHashBehaviorTest {

private static <T extends Comparable<T>> int pseudoHash(final T element, final T min, final T max, final int numberOfBuckets) {
// Reproduces the production hash() logic
double range = max.compareTo(min);
double normalizedValue = element.compareTo(min) / range; // -1/0/1 divided by -1/0/1
return (int) (normalizedValue * (numberOfBuckets - 1));
}

@Test // Test case when all numbers are equal
void sortStillCorrectWhenAllEqual() {
Integer[] arr = {1, 1, 1, 1, 1};
Integer[] expected = arr.clone();

new BucketSort().sort(arr);
assertArrayEquals(expected, arr);

// Observe bucket mapping (all collapse to index 0)
Integer min = 1;
Integer max = 1;
int numberOfBuckets = Math.max(arr.length / 10, 1); // same as BUCKET_DIVISOR rule
int idx = pseudoHash(1, min, max, numberOfBuckets);
// idx will be 0 because NaN cast to int -> 0 in Java
System.out.println("All-equal case -> bucket index: " + idx);
}

@Test // Test case with non-equal integers
void sortStillCorrectNonEqualIntegers() {
Integer[] arr = {20, 40, 30, 10};
Integer[] expected = {10, 20, 30, 40};

new BucketSort().sort(arr);
assertArrayEquals(expected, arr);

Integer min = Arrays.stream(arr).min(Integer::compareTo).get();
Integer max = Arrays.stream(arr).max(Integer::compareTo).get();
int numberOfBuckets = Math.max(arr.length / 10, 1); // often 1 here; bump to 4 to demonstrate
numberOfBuckets = 4;

for (Integer x : arr) {
int idx = pseudoHash(x, min, max, numberOfBuckets);
System.out.println("Value " + x + " -> bucket " + idx);
}
// Expect only two distinct buckets because compareTo gives -1/0/1
}

@Test // Test case when the Array contains Strings
void sortStillCorrectWhenStrings() {
String[] arr = {"apple", "banana", "carrot"};
String[] expected = arr.clone();

new BucketSort().sort(arr);
assertArrayEquals(expected, arr);

String min = Arrays.stream(arr).min(String::compareTo).get();
String max = Arrays.stream(arr).max(String::compareTo).get();
int numberOfBuckets = 4;

for (String s : arr) {
int idx = pseudoHash(s, min, max, numberOfBuckets);
System.out.println("Value \"" + s + "\" -> bucket " + idx);
}
// Buckets reflect only lexicographic order, not a numeric spacing
}
}

AltStyle によって変換されたページ (->オリジナル) /