Given a sorted array, return the 'closest' element to the input 'x'.
I do understand the merits of unit testing in separate files, but I deliberately added it to the main method for personal convenience, so don't consider that in your feedback.
I'm looking for request code review, optimizations and best practices.
public final class ClosestToK {
private ClosestToK() { }
/**
* Given a sorted array returns the 'closest' element to the input 'x'.
* 'closest' is defined as Math.min(Math.abs(x), Math.abs(y))
*
* Expects a sorted array, and if array is not sorted then results are unpredictable.
*
* If two values are equi-distant then the greater value is returned.
*
* @param a The sorted array a
* @return The nearest element
*/
public static int getClosestK(int[] a, int x) {
int low = 0;
int high = a.length - 1;
while (low <= high) {
int mid = (low + high) / 2;
// test lower case
if (mid == 0) {
if (a.length == 1) {
return a[0];
}
return Math.min(Math.abs(a[0] - x), Math.abs(a[1] - x)) + x;
}
// test upper case
if (mid == a.length - 1) {
return a[a.length - 1];
}
// test equality
if (a[mid] == x || a[mid + 1] == x) {
return x;
}
// test perfect range.
if (a[mid] < x && a[mid + 1] > x) {
return Math.min(Math.abs(a[mid] - x), Math.abs(a[mid + 1] - x)) + x;
}
// keep searching.
if (a[mid] < x) {
low = mid + 1;
} else {
high = mid;
}
}
throw new IllegalArgumentException("The array cannot be empty");
}
public static void main(String[] args) {
// normal case.
int[] a1 = {10, 20, 30, 40};
assertEquals(30, getClosestK(a1, 28));
// equidistant
int[] a2 = {10, 20, 30, 40};
assertEquals(30, getClosestK(a2, 25));
// edge case lower boundary
int[] a3 = {10, 20, 30, 40};
assertEquals(10, getClosestK(a3, 5));
int[] a4 = {10, 20, 30, 40};
assertEquals(10, getClosestK(a4, 10));
// edge case higher boundary
int[] a5 = {10, 20, 30, 40};
assertEquals(40, getClosestK(a5, 45));
int[] a6 = {10, 20, 30, 40};
assertEquals(40, getClosestK(a6, 40));
// case equal to
int[] a7 = {10, 20, 30, 40};
assertEquals(30, getClosestK(a7, 30));
}
}
7 Answers 7
Here's my solution which seems to work fine :
public static int getClosestK(int[] a, int x) {
int low = 0;
int high = a.length - 1;
if (high < 0)
throw new IllegalArgumentException("The array cannot be empty");
while (low < high) {
int mid = (low + high) / 2;
assert(mid < high);
int d1 = Math.abs(a[mid ] - x);
int d2 = Math.abs(a[mid+1] - x);
if (d2 <= d1)
{
low = mid+1;
}
else
{
high = mid;
}
}
return a[high];
}
Here's the principle : the interval [low, high]
will contain the closest element to x
at any time. At the beginning, this interval is the whole array ([low, high] = [0, length-1]
). At each iteration, we'll make it strictly smaller. When the range is limited to a single element, this is the element we are looking for.
In order to make the range smaller, at each iteration, we'll consider mid
the middle point of [low, high]
. Because of the way mid
is computed, mid+1
is also in the range. We'll check if the closest value is at mid
or mid+1
and update high
or low
accordingly. One can check that the range actually gets smaller.
Edit to answer to comments:
As spotted by @Vick-Chijwani , this code doesn't handle perfectly the scenarios where an element appears multiple times in the input. One can add the following working tests to the code :
// case similar elements
int[] a8 = {10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10};
assertEquals(10, getClosestK(a8, 9));
assertEquals(10, getClosestK(a8, 10));
assertEquals(10, getClosestK(a8, 11));
but this one will fail :
int[] a9 = {1, 2, 100, 100, 101};
assertEquals(3, getClosestK(a9, 2)); // actually returns 100
-
\$\begingroup\$ What if
a[mid] == a[mid+1]
? I imagine you'd have to walk to the left or right until they are unequal, then proceed to computed1
andd2
. But then you lose the worst-caseO(log n)
guarantee. \$\endgroup\$Vicky Chijwani– Vicky Chijwani2016年05月13日 09:32:37 +00:00Commented May 13, 2016 at 9:32 -
\$\begingroup\$ Hi @VickyChijwani, thanks for your feedback. I'm AFK for a few days but I'll try to think about it and let you know asap :-) \$\endgroup\$SylvainD– SylvainD2016年05月13日 19:18:47 +00:00Commented May 13, 2016 at 19:18
-
\$\begingroup\$ Thanks! No hurry. The idea of walking left and right that I suggested, seems to work well enough for my use-case. \$\endgroup\$Vicky Chijwani– Vicky Chijwani2016年05月13日 19:21:25 +00:00Commented May 13, 2016 at 19:21
-
\$\begingroup\$ @VickyChijwani Everything seems to work fine. I've added an edit to my answer. Please let me know if you need more details. Basically, when 2 (consecutive) elements are equal, we have
d1 == d2
(which is equal to 0 if this is the element we where looking for) and we reduce the search range accordingly :low = mid+1;
. Please let me know if you see an issue. \$\endgroup\$SylvainD– SylvainD2016年05月17日 09:15:14 +00:00Commented May 17, 2016 at 9:15 -
\$\begingroup\$ Your solution returns
100
fora = [1, 2, 100, 100, 101]
andx = 3
, which is incorrect. The reason is that, whend1 == d2
, you don't have enough information to decide which half of the array to check next, so you must somehow make them unequal before proceeding. \$\endgroup\$Vicky Chijwani– Vicky Chijwani2016年05月17日 09:30:21 +00:00Commented May 17, 2016 at 9:30
Are you allowed to use java.util.Arrays.binarySearch ? If you don't need to detect whether or not the array is sorted, this will give you the one or two closest elements quickly.
public static int getClosestK(int[] a, int x) {
int idx = java.util.Arrays.binarySearch(a, x);
if ( idx < 0 ) {
idx = -idx - 1;
}
if ( idx == 0 ) { // littler than any
return a[idx];
} else if ( idx == a.length ) { // greater than any
return a[idx - 1];
}
return d(x, a[idx - 1]) < d(x, a[idx]) ? a[idx - 1] : a[idx];
}
private static int d(int lhs, int rhs) {
return Math.abs(lhs - rhs);
}
Your solution is complex because you are trying to mix at least two different problems in a single algorithm. Sorry my examples are in C++, but I hope you get the algorithmic idea.
Here is what a standard binary search looks like:
/**
* \brief Find the first non-zero element of an ordered array
*
* @param a array to search in
* @param s offset to start searching from
*
* @return offset of the first non-zero element, \e a.size() if none found
*/
template <class A>
size_t binary_search(const A& a, size_t s = 0)
{
size_t from = s, to = a.size(), mid;
while (from != to) {
mid = (from + to) / 2;
if (a[mid]) to = mid;
else from = mid + 1;
}
return from;
}
If a
is in ascending order, you will need to find the position pos
of its first element that is greater than x
, i.e. substitute
a[mid] > x
for a[mid]
, where x
can be made an additional input parameter:
template <class A, class T>
size_t binary_search(const A& a, const T& x, size_t s = 0)
(in a more generic design, a
would really be a customizable function object to be called by a(mid)
).
Having found pos
:
- if
pos == 0
(all elements> x
), return0
(this includes the case ofa
being empty); - if
pos == a.size()
(not found; all elements<= x
), returna.size() - 1
(last element); - otherwise, return one of the two positions
pos - 1
andpos
depending on which element ofa
is closer tox
.
That is:
template <class A, class T>
size_t find_closest(const A& a, const T& x)
{
size_t pos = binary_search(a, x);
return pos == 0 ? 0 :
pos == a.size() ? a.size() - 1 :
x - a[pos - 1] < a[pos] - x ? pos - 1 : pos;
}
Note that due to known ordering, you don't need Math.abs
. Also note that in case of ambiguities (equalities), this approach returns the rightmost possible position of all equivalent ones.
This is clear, O(log n)
, and does not mess with Math.abs
or anything problem-specific during the critical search process. It separates the two problems.
The function object approach could be
size_t pos = binary_search([&](size_t p){return a[p] > x;});
using a lambda in C++11, and leaving binary_search
as originally defined, except for changing a[mid]
to a(mid)
.
-
\$\begingroup\$ I agree, this is much cleaner and clearer. The binary search algorithm should be abstracted away with the proper functor argument. (+1) However : template arguments are arguments and deserve proper names ! \$\endgroup\$Julien__– Julien__2015年04月03日 19:43:31 +00:00Commented Apr 3, 2015 at 19:43
A minor thing to consider is caching values you lookup, for example, in each step of your while
you calculate a.length - 1
, since this value does not change you could have create a int last = a.length - 1;
which would make the following fractionally faster and more readable:
// test upper case <- unfortunate choice of comment ? upper bound perhaps?
if (mid == last) {
return a[last];
}
I would also consider caching a[mid]
and a[mid+1]
into well named variables.
Finally, this:
if (a.length == 1) {
return a[0];
}
Belongs in front of the while loop.
You should avoid undefined behaviour: you say
Expects a sorted array, and if array is not sorted then results are unpredictable
but humans make errors all the time and giving a random result if the user forgets to enter a sorted input is not nice.
At the cost of some performance I would add (should be easy to translate into Java):
def is_sorted(array):
for index, item in enumerate(array):
if index != 0:
item_before = array[index-1]
if not item >= item_before:
return False
return True
and in your function:
if not is_sorted(array):
raise IllegalArgumentException("Array is not sorted.")
If you really care about speed you can add a flag insecure
so that when the user wants full speed and knows what he's doing the check can be skipped.
The same again and again...
Do not sum endpoints' indices:
int mid = (low + high) / 2;
Subtract them instead:
int mid = low + (high - low) / 2;
This will prevent an arithmetic overflow and possible getting mid < 0
in case your array's length exceeds a half of the int
type maximum value.
+1 for @Josay's answer. In case the array a
can contain equal entries, please consider this slight improvement:
public static int getClosestK(int[] a, int x) {
int low = 0;
int high = a.length - 1;
if ( high < 0 )
throw new IllegalArgumentException("The array cannot be empty");
OUTER:
while ( low < high ) {
int mid = (low + high) / 2;
assert ( mid < high );
int d1 = Math.abs(a[mid ] - x);
int d2 = Math.abs(a[mid+1] - x);
if ( d2 < d1 )
low = mid + 1;
else if ( d2 > d1 )
high = mid;
else { // --- handling "d1 == d2" ---
for ( int right=mid+2; right<=high; right++ ) {
d2 = Math.abs(a[right] - x);
if ( d2 < d1 ) {
low = right;
continue OUTER;
} else if ( d2 > d1 ) {
high = mid;
continue OUTER;
}
}
high = mid;
}
}
return a[high];
}
--- UPDATE ---
Why the hate (-1)? Handling d1 == d2
is important as the array is subset wrongly otherwise.
E.g., consider a=[1,2,2,3]
and x=0
. Then d1=d2=2
and Josay's answer sets low=mid+1
which subsets the array to [2,3]
. This subset does not contain the correct answer (1
) anymore.
-
\$\begingroup\$ You have presented an alternative solution, but haven't reviewed the code. Please explain your reasoning (how your solution works and how it improves upon the original) so that the author can learn from your thought process. \$\endgroup\$Ethan Bierlein– Ethan Bierlein2016年03月04日 15:57:44 +00:00Commented Mar 4, 2016 at 15:57
-
\$\begingroup\$ The slight improvement is
--- handling "d1 == d2" ---
. \$\endgroup\$dotwin– dotwin2016年03月04日 16:02:35 +00:00Commented Mar 4, 2016 at 16:02 -
\$\begingroup\$ Sure, but why is it an improvement? What advantages does it have over the original? You need to elaborate on your changes. \$\endgroup\$Ethan Bierlein– Ethan Bierlein2016年03月04日 16:03:52 +00:00Commented Mar 4, 2016 at 16:03
Math.Min(Math.Abs(x-el[i])
wherei
is all elements of the array? \$\endgroup\$