I was trying to make a sorting algorithm for an array of integers. Here's are the steps/theory:
It turns an array into a HashSet
, iterates over every integer value from the minimum value in the array to the maximum. It carries an array that starts with 1 element, the minimum. While iterating to higher values through increments, if it finds a value in that HashSet that is equal to the value, it pushes that value to the array and continues iterating until it hits the max value.
Here is a JS implementation of what I mean with a shuffled array from 1 to a 1000 with 2 integers missing (which are 224
and 507
that I manually deleted, so 998 elements in the array)
const log = console.log
let arr = [71,825,522,901,579,219,251,6,672,9,578,81,195,315,64,763,538,500,772,75,713,127,800,766,185,609,945,182,511,803,671,365,309,920,159,572,225,405,403,123,77,283,370,421,98,937,76,437,916,21,15,376,641,218,870,122,878,449,138,475,142,893,966,152,394,749,134,687,924,595,859,100,824,333,524,104,850,358,254,407,751,646,933,443,61,736,699,745,784,773,454,388,922,615,119,950,528,607,166,816,596,600,741,292,299,926,291,724,711,550,639,627,462,176,161,902,156,31,258,983,624,667,257,598,837,350,163,558,115,351,214,717,51,99,810,238,375,441,228,848,97,770,169,808,664,764,137,1000,946,849,799,354,856,688,38,321,868,845,40,217,366,360,820,682,456,904,73,935,701,807,726,594,493,363,861,14,539,301,439,665,928,434,999,260,330,758,307,237,995,153,476,669,542,270,102,978,915,481,440,815,177,203,728,563,490,406,748,265,145,526,661,362,171,192,359,754,173,155,280,425,684,415,791,266,503,90,865,903,19,364,940,863,533,199,317,710,442,554,178,734,174,463,706,435,725,997,702,547,36,882,930,847,485,328,670,927,224,187,722,12,683,515,140,50,760,658,871,514,649,608,809,862,636,355,41,757,562,794,148,491,322,495,715,753,261,418,556,811,605,747,319,663,938,357,704,900,492,857,325,777,629,264,887,812,29,65,49,619,504,836,662,83,628,698,604,230,121,653,220,762,689,212,361,11,1,465,918,141,259,194,543,385,497,279,973,737,712,467,103,79,170,209,914,775,520,519,110,459,941,744,25,470,52,656,131,383,342,382,931,569,846,638,352,48,339,586,297,395,60,318,529,335,603,986,453,535,793,552,346,587,557,472,44,271,331,675,774,537,635,525,461,697,416,445,248,952,985,457,839,17,263,631,94,268,386,377,886,298,120,781,278,223,464,471,206,546,393,139,679,408,881,874,183,707,398,3,967,196,189,305,906,883,344,634,92,723,714,136,483,245,829,826,532,730,827,830,626,124,984,401,782,720,452,898,567,236,994,612,677,242,790,948,56,840,23,934,39,565,96,314,241,30,469,412,204,879,795,107,125,282,981,414,304,716,601,787,191,62,281,894,151,235,959,801,998,574,499,226,128,789,921,160,295,517,540,765,308,873,78,296,834,501,813,549,22,399,284,971,450,486,960,709,502,955,771,28,482,68,833,340,694,129,426,561,311,911,835,113,685,423,780,84,494,735,516,822,796,909,432,57,381,876,571,555,369,496,963,969,872,739,643,232,972,623,988,786,473,190,752,42,367,16,327,585,854,742,144,591,866,126,243,479,647,633,353,252,59,573,805,13,436,424,982,74,2,34,430,290,215,197,621,534,877,559,593,592,884,288,7,977,923,875,806,648,767,674,508,274,545,889,566,613,895,956,577,944,53,729,618,976,802,614,606,962,400,718,905,66,518,105,844,149,632,211,819,286,58,957,460,262,611,993,964,792,891,645,18,419,719,164,731,560,253,202,630,858,943,67,409,267,932,269,444,814,387,527,992,55,974,411,951,372,564,590,468,818,275,755,389,657,617,338,625,45,690,368,154,80,256,681,990,216,429,277,474,821,130,996,273,740,167,249,949,334,642,26,179,768,54,150,85,912,804,175,294,234,337,101,8,106,240,20,233,853,695,447,108,838,146,205,379,551,510,86,817,93,541,652,111,239,640,374,410,487,488,864,95,349,320,184,35,531,597,785,306,4,135,644,72,255,227,880,738,289,89,132,954,326,336,303,783,654,272,438,855,548,536,907,521,678,422,343,246,384,523,925,208,200,899,947,616,929,828,158,332,207,221,888,913,779,703,896,310,582,466,588,293,378,373,580,610,402,622,867,420,673,851,427,841,392,509,247,345,890,897,91,987,478,660,37,581,506,147,696,287,637,732,965,112,33,87,553,843,970,743,63,570,484,477,5,312,576,692,892,489,32,47,750,910,676,860,390,396,316,505,761,650,756,708,276,961,180,979,568,455,193,323,919,82,347,968,659,324,458,975,480,70,727,666,451,285,942,433,575,602,769,746,776,46,118,417,397,231,651,831,908,512,162,693,210,201,391,869,980,778,186,530,448,404,24,356,939,114,788,498,686,117,759,989,250,936,885,691,109,680,832,620,842,823,513,88,10,431,329,181,168,705,172,43,668,165,721,198,413,371,798,544,313,852,583,213,599,446,589,157,229,380,917,991,348,341,27,302,222,584,116,797,133,958,428,143,188,953,733,300,69,700,655
];
const sort = to_sort => {
let parts = new Set(to_sort);
function getMinMax(arr) {
let min = arr[0];
let max = arr[0];
let i = arr.length;
while (i--) {
min = arr[i] < min ? arr[i] : min;
max = arr[i] > max ? arr[i] : max;
}
return { min, max };
}
let minMax = getMinMax(to_sort);
const sortArray = (value,array) => {
if (value > minMax.max) {
return array;
}
if (parts.has(value)) {
array.push(value);
}
return sortArray(value+1,array)
}
return sortArray(minMax.min+1,[minMax.min])
}
console.time()
log(sort(arr));
console.timeEnd()
Limitations I know of are:
There cannot be any duplicates in the array, I could use a
HashMap
if there are.Bad performance for arrays where the integer value difference is significant, which means more iterating and a possible
call stack exceeded
error.Uses a decent amount of memory
It takes about 8 milliseconds to sort the array when testing it online
After all of this context, I would like to know:
What sort of algorithm is this (or based on)?
Does this have any performance benefit compared to other algorithms?
3 Answers 3
Review
Style
Use
const
for constant variables.arr
,parts
,minMax
all should be constants.Spaces between operators eg
value+1
isvalue + 1
, and after commas eg[71,825,522,901
should be[71, 825, 522, 901
JavaScript uses camelCase, avoid using snake_case.
to_sort
should betoSort
Naming is inconsistent. You call the array,
toSort
, andarr
. Try and use the same name for the same data.Inconsistent use of semicolons.
Design
Some design changes could be considered.
Finding min max
The function getMinMax
can be simplified by using Math.min
, and Math.max
, or you can take advantage of the fact that the min and max values will always be different. See rewrites.
Exit recursion
A very common (what I consider) mistake in recursive algorithms is the placement of the exit clause.
You have if (value > minMax.max) { return arr; }
as first line inside the call. This can be done before the recursion call. See rewrites
Though a very minor optimization in this case, there are many cases where this can provide significant improvements.
Call stack limit
Recursion in JS is always a problem as there is a limit to the call stack well below the array size limit. To avoid the call stack overflow it is always better to use the simpler loop. For simple algorithms a loop often provides more performant execution.
Consider the known knowns
As it is required that all values be unique it is highly probable that the min and max values are already known, this also applies to the array, the same set may need sorting many times. As an optimization you can accept both the hash map and the min and max.
You could also provide an option to give the common divisor so that a range of values can be sorted eg [0.3, 0.1, 0.2, 0.4]
would use the divisor 0.1
Questions
"What sort of algorithm is this (or based on)?"
Your sort is not a real sort as it relies on the uniqueness of each item in the array and the items must be numeric (or at least convertible to unique ordered numeric values) with a common divisor (in this case 1)
Because of this it can sort an array of unique values in \$O(n)\$, even if you consider arrays with irregular spaced values \$n\$ is scaled by the mean of the differences, such that arr = [0, 2 ** 31 -1]
will require 2 ** 31 -1
steps to sort it is still \$O(n)\$.
However your function should only be considered an option if the mean difference is less than \$log(n)\$ if all other conditions are met.
This sort is a type of Non-comparison sort as the name suggests it does not compare values.
"Does this have any performance benefit compared to other algorithms?"
Yes, there are many situation that require the sorting of a known set of unique values. If the interval is constant between items it is well worth considering it as a sort option.
Considering that the min, max, and hash map can be known ahead of time the algorithm can be considered very useful in these circumstances.
Rewrites
Two alternative min max functions
function getMinMax(arr) {
return {
min: Math.min(...arr),
max: Math.max(...arr),
};
}
Or slightly faster then your original.
function getMinMax(arr) {
var min = arr[0], max = min;
for (const val of arr) { val < min ? min = val : val > max && (max = val) }
return { min, max };
}
The Sort
Adding options for known min
, max
and hash map
the function can be written to avoid the overheads to calculate these values each time. Useful when sorting the same data many times.
Using recursion
function sort(arr, min, max, map = new Set(arr)) {
[min, max] = min === undefined ? minMax(arr) : [min, max];
return sort(min + 1, [min]);
function minMax(arr) {
var min = arr[0], max = min;
for (const val of arr) { val < min ? min = val : val > max && (max = val) }
return [min, max];
}
function sort(value, arr) {
map.has(value) && arr.push(value);
return ++value > max ? arr : sort(value, arr);
}
}
console.log(sort([6, 0, 1, 9, 3, 4, 5, 2, 7, 8], 0, 9) + "");
Using while loop
This version also adds a divisor optional parameter.
function sort(arr, min, max, divisor = 1, map = new Set(arr)) {
[min, max] = min === undefined ? getMinMax(arr) : [min, max];
const res = [min];
var value = min + divisor;
while (value <= max) {
map.has(value) && res.push(value);
value += divisor;
}
return res;
function getMinMax(arr) {
var min = arr[0], max = min;
for (const val of arr) { val < min ? min = val : val > max && (max = val) }
return [min, max];
}
}
console.log(sort([12, 0, 2, 18, 6, 8, 10, 4, 14, 16], 0, 18, 2) + "");
Use const
instead of let
to prevent accidental overwrite of variables.
Use consistent variable names. Seems weird to combine snake_case and camelCase.
For empty input, min and max is undefined. You can check this case and return empty array directly.
The getMinMax function is hidden inside the sort function and only called once, you can unwrap it and just embed it's body directly within the sort function.
But personally, I wouldn't bother with the getMinMax function and just use Math.min and Math.max. That single loop seems like premature optimization.
The sortArray function does not need to be recursive, it can be a simple loop. In which case it would be called only once and thus could be also unwrapped and it's body embedded directly in the sort function. This would also avoid call stack exceeded problem.
Pushing to the array is not necessary. You know from the start how many items will be in the output - same as in the input. Resize the array accordingly and keep track of index where to write the next element.
I don't know if this has any name or if it has been used for something real world.
So let me just try to describe the complexity of your algorithm.
Seems that the algorithm is O(n-min+max) in time, so it may theoretically be faster for certain inputs then common sorting methods that usually are between O(n*log(n)) and O(n^2) inclusive. (n - number of input elements, min - minimum value, max - maximum value). Question is for how many elements and how small max-min difference this starts to pay off and whether it's worth it at all. Could be, but you must be working in a context where you know that your inputs are satisfying the criteria that lead to its good performance. Well, at least most of the time.
The memory complexity is O(n) which is minimum memory complexity for any not in-place sorting method. Although here we need some extra memory for the Set. And of course in your recursive implementation also for all the stack frames.
PS: your code has some indentation issues. Not sure if this comes from copy pasting but it's always nice to read properly indented code.
Name of Algorithm
What you had implemented is not a correct sort algorithm (as it cannot handle duplicate items). So there won't be any sort algorithm work like this. The most similar one I can see would be Counting sort . Counting sort also track how many occurrence of each integer in the array so it would work with duplicate values.
Time complexity
Time complexity of your program is \$O\left(max-min+size\right)\$. As this algorithm is not a compare based sorting algorithm, the lower bound of time complexity \$\Omega\left(size\cdot\log size\right)\$ for compare based algorithms not applied here. It would work better than compare based algorithms (quick sort for example) when given input only contains a small range of integers.
Implementation
The hash set is not required by getting \$min\$ / \$max\$. A simple loop on input array would work. And your loop may iterate over the given array, not from \$min\$ to \$max\$. You may find a lot of implementations if you search "Counting sort". So, I believe, I don't need to add another one myself here.
Explore related questions
See similar questions with these tags.