Given a list of n integers sorted in ascending order, find the closest number (value) in the list to a given target number. How quickly can you do it?
I'm interested to know if there's any feedback for my code.
def getClosestValue(arr, target):
n = len(arr)
left = 0
right = n - 1
mid = 0
# edge case
if (target >= arr[n - 1]):
return arr[n - 1]
# BSearch solution: Time & Space: Log(N)
while (left < right):
mid = (left + right) // 2 # find the mid
if (arr[mid] == target):
return arr[mid]
if (target < arr[mid]):
# If target is greater than previous
# to mid, return closest of two
if (mid > 0 and target > arr[mid - 1]):
return findClosest(arr[mid - 1], arr[mid], target)
# update right
right = mid
else:
if (mid < n - 1 and target < arr[mid + 1]):
return findClosest(arr[mid], arr[mid + 1], target)
# update i
left = mid + 1
return arr[mid]
# findClosest
# We find the closest by taking the difference
# between the target and both values. It assumes
# that val2 is greater than val1 and target lies
# between these two.
def findClosest(val1, val2, target):
if (target - val1 >= val2 - target):
return val2
else:
return val1
# Sample code
arr = [1, 2, 3, 4, 5, 5, 8, 10]
target = 7
print(getClosestValue(arr, target))
2 Answers 2
Conventions
The Style guide to follow in Python is PEP8. If you do so, your code blends right into the python community and it is immediately easier to follow. On that regard you'll see that the functions name's don't follow the recommended style.
getClosestValue
Should actually be:
get_closest_value
The relevant quote from PEP8 for this case
Use the function naming rules: lowercase with words separated by underscores as necessary to improve readability.
Still in this topic, all the parenthesis used in the ifs
are redundant and break the style.
if (target >= arr[n - 1]):
Should be turned into:
if target >= arr[n - 1]:
Although i suspect that this may be a habit brought from other languages.
Edge cases
You contemplated the case where the value is the highest or above, which gives you a quick way out:
# edge case
if (target >= arr[n - 1]):
return arr[n - 1]
However there is no corresponding edge case for when the value is the lowest or below. I believe you merely forgot it.
#edge case - first or below all
if target <= arr[0]:
return arr[0]
Logic
Now both edge cases are covered, so the while
can be simplified to only have the navigation to the corresponding item if there is one.
while (left < right):
mid = (left + right) // 2 # find the mid
if arr[mid] == target:
return arr[mid]
if target < arr[mid]:
right = mid
else:
left = mid + 1
After this last while
if there isn't an exact match, calculate the nearest to the one you ended on, and return it:
if target < arr[mid]:
return find_closest(arr[mid - 1], arr[mid], target)
else:
return find_closest(arr[mid], arr[mid + 1], target)
Note that i don't need to check if mid - 1
or mid + 1
is a valid index, because if it wasn't it would mean that the value was below or above all elements, and those cases were already covered in the beginning of the function.
This is not only simpler but also more efficient since it only checks for the closest on the very end and not at each iteration.
Taking @bipll suggestion you can restructure the while
a bit. Considering you are only getting if arr[mid] == target
at the very last iteration you can first check for <
or >
. This avoids making one extra check that will fail most of the times:
while left < right:
mid = (left + right) // 2 # find the mid
if target < arr[mid]:
right = mid
elif target > arr[mid]:
left = mid + 1
else:
return arr[mid]
When i have a simple condition with a return
on both cases i rather write them inline since its easy to read and a bit more concise:
def find_closest(val1, val2, target):
return val2 if target - val1 >= val2 - target else val1
But don't go over the top on this. It can easily be harder to read depending on the complexity of the condition.
Full Modified Version
For you to get a better picture of all the changes i mentioned, i'll leave you with the full modified version:
def get_closest_value(arr, target):
n = len(arr)
left = 0
right = n - 1
mid = 0
# edge case - last or above all
if target >= arr[n - 1]:
return arr[n - 1]
# edge case - first or below all
if target <= arr[0]:
return arr[0]
# BSearch solution: Time & Space: Log(N)
while left < right:
mid = (left + right) // 2 # find the mid
if target < arr[mid]:
right = mid
elif target > arr[mid]:
left = mid + 1
else:
return arr[mid]
if target < arr[mid]:
return find_closest(arr[mid - 1], arr[mid], target)
else:
return find_closest(arr[mid], arr[mid + 1], target)
# findClosest
# We find the closest by taking the difference
# between the target and both values. It assumes
# that val2 is greater than val1 and target lies
# between these two.
def find_closest(val1, val2, target):
return val2 if target - val1 >= val2 - target else val1
-
\$\begingroup\$ From how you move
right
it can be assumed thatarr[right]
is always greater than the key, so the loop's condition can be extended towhile left < right - 1:
, which can reduce the search by one iteration. And doesn't it make sense to first check for both inequalities, and only after that handle the exact match case? The probability to find an exact match is not that high for moderately sparse arrays. \$\endgroup\$bipll– bipll2018年03月22日 13:28:17 +00:00Commented Mar 22, 2018 at 13:28 -
\$\begingroup\$ I mean, inside the lookup loop, on each iteration first check
if arr[mid] < target:
, thenelif arr[mid] > target:
, and then in the lastelse:
clause handle the accidental exact match, as the latter seems the least expected case (even for sparse arrays). \$\endgroup\$bipll– bipll2018年03月22日 14:51:16 +00:00Commented Mar 22, 2018 at 14:51 -
\$\begingroup\$ @bipll On my final code the first check in the
while
is for equality withif arr[mid] == target:return arr[mid]
. After that i check if its below withif target < arr[mid]:
and the only remaining case is above which is the correspondingelse
. So i'm not sure i follow what you are trying to say. \$\endgroup\$Isac– Isac2018年03月22日 15:00:40 +00:00Commented Mar 22, 2018 at 15:00 -
\$\begingroup\$ I am trying to say that on your final code, the very first check is for the least probable case, so a) this operation is always performed; and b) most of the time it is performed for nothing. It would make more sense to first check if array element is not equal to target, and only if not, consider the (unlikely) exact match in an
else:
clause. \$\endgroup\$bipll– bipll2018年03月22日 15:02:57 +00:00Commented Mar 22, 2018 at 15:02 -
\$\begingroup\$ @bipll Oh i see now what you are trying to say and it makes sense. Thanks for the comment. \$\endgroup\$Isac– Isac2018年03月22日 15:05:57 +00:00Commented Mar 22, 2018 at 15:05
python has a binary search implementation: https://docs.python.org/2/library/bisect.html
so just:
import bisect
ind = bisect.bisect_left(arr, target)
if you get ind> len(arr)-2, then the solution is arr[-1]. Otherwise you just hceck arr[ind] and arr[ind+1] to find whichever is closest
Strictly speaking, if you implemented your own binary search it should be just as fast. However, having a two line implementation is easier to read than defining your own functions