Improving efficiency for two sum

Question 1

I am trying to solve this problem:

Given an array of integers, find two numbers such that they add up to a specific target number.

and this is my strategy:

For each i in array num, do a binary search in numfor target - i in num.

class Solution:
 # @return a tuple, (index1, index2)
 def binary_search(self, array, target):
 def _binary_search(a, l, h, t):
 if l == h:
 return None
 elif a[(h + l) / 2] == t:
 return (h + l) / 2
 elif a[(h + l) / 2] > t:
 return _binary_search(a, l, (h + l) / 2, t)
 elif a[(h + l) / 2] < t:
 return _binary_search(a, (h + l) / 2 + 1, h, t)
 else:
 return None
 return _binary_search(array, 0, len(array) - 1, target)
 def twoSum(self, num, target):
 s_num = sorted(num)
 z_num = [(i, num.index(i)) for i in s_num]
 for tup in z_num:
 n = tup[0]
 i = tup[1]
 t = self.binary_search(s_num, target - n)
 if t != None:
 index_2 = num.index(target - n)
 if i == index_2:
 index_2 = i + 1 + num[i:].index(target - n)
 return (i + 1, index_2 + 1)

Time complexity: \$O(n * log(n))\$

This algorithm is not accepted since the it exceeds the time limitation. How can I improve it?

Question 2

Use docstrings properly instead of inventing your own documentation format.

Question 3

You are correct that the first step is to sort the array. The second step - an actual search for solution - is also happen to be of n*log(n) complexity. What is important, they are very different n*log(n): sorting is implemented in native code, so it is virtually instantaneous comparing to the Python loops. The key to solution is to realize that the second step can be done in linear time:

Given a sorted array data you may build another sorted array target_minus_data in linear time. It is also sorted, descending. Reverse it (linear time). Now there are two sorted arrays sharing the same value. It can be found also in linear time by a merge-like algorithm.

Of course, you don't have to physically reverse target_minus_data - just iterate it backwards. If you look closely, you do not even have to build it; everything can be done in the fly. Merge data with itself, from both ends, a backward iterator returning target - data[bkwd].

Question 4

A few basic comments on the code instead of the algorithm.

z_num = [(i, num.index(i)) for i in s_num]

This line is doing exactly what enumerate() does, except it will allocate the entire list at the start.

for tup in z_num:
 n = tup[0]
 i = tup[1]

Python can do the unpacking for you.

for n, i in z_num:
 # ...

This means the entire thing can be done in one line.

for n, i in enumerate(s_num):
 # ...

Question 5

As explained here and there, if you can use data structures and your goal is to reduce time complexity, then a dictionary works well for this kind of problem and the code is pretty straightforward:

def twoSum(numbers, target):
 """Find two numbers such that they add up to a specific number
 :param numbers: Input numbers to look for
 :type numbers: list(int)
 :param target: Target number that defines the sum of the numbers
 :type target: int
 :returns: Indexes for the numbers that solve the problem (if found)
 :rtype: tuple(int, int) | None
 """
 # Create a mapping from number to array index
 numbers = {number: index for index, number in enumerate(numbers)}
 for first_number in numbers:
 second_number = target - first_number
 if second_number in numbers:
 return numbers[first_number] + 1, numbers[second_number] + 1
print twoSum([2, 7, 11, 15], 32)

Note that in the average case this will find a solution in \$O(n)\$ because you don't really need to sort the array to find a solution.

Question 6

A solution has been accepted but let's give some additional comment anyway.

It might be worth storing the result of (h + l)/2 : on top of making things potentially more efficient, it does makes things much easier to read/write.

def binary_search(self, array, target):
 def _binary_search(a, l, h, t):
 if l == h:
 return None
 mid = (h + l)/2
 if a[mid] == t:
 return mid
 elif a[mid] > t:
 return _binary_search(a, l, mid, t)
 elif a[mid] < t:
 return _binary_search(a, mid + 1, h, t)
 else:
 return None
 return _binary_search(array, 0, len(array) - 1, target)

Then, it becomes easier to see that the last branch cannot be reached.

def binary_search(self, array, target):
 def _binary_search(a, l, h, t):
 if l == h:
 return None
 mid = (h + l)/2
 if a[mid] == t:
 return mid
 elif a[mid] > t:
 return _binary_search(a, l, mid, t)
 else: 
 assert a[mid] < t
 return _binary_search(a, mid + 1, h, t)
 return _binary_search(array, 0, len(array) - 1, target)

Also, recursion is slow in Python but before falling for premature optimisations, let's check that things are working fine.

assert binary_search2([], 1) is None fails : you should be doing if l>=h instead of if l==h.
then assert binary_search2([1], 1) == 0 fails, let's replace l >=h by l > h.
then assert binary_search2([2], 1) is None fails, let's replace _binary_search(a, l, mid, t) by _binary_search(a, l, mid - 1, t) to ensure the range becomes smaller and smaller.

Now, things seem to be working slighly better :

def binary_search(self, array, target):
 def _binary_search(a, l, h, t):
 if l > h:
 return None
 mid = (h + l)/2
 if a[mid] == t:
 return mid
 elif a[mid] > t:
 return _binary_search(a, l, mid -1 , t)
 else:
 assert a[mid] < t
 return _binary_search(a, mid + 1, h, t)
 return _binary_search(array, 0, len(array) - 1, target)

Now, we can try to remove the recursion :

def binary_search(self, array, target):
 # if target is in the array, it must be in [low, high]
 low, high = 0, len(array) - 1
 while low <= high:
 mid = (high + low)/2
 if array[mid] == target:
 return mid
 elif array[mid] > target:
 high = mid - 1
 else:
 low = mid + 1
 return None

I haven't performed any benchmark but I reckon this function should be faster (and also, more robust).

Something else you could change to make your code better is stop writing classes. There is no need for a class here and by using one, you are just making things harder to understand/reuse for no reason.

Question 7

I used Python's built in binary search (via the bisect module) using your basic idea and it worked, with a few minor modifications. I went with the built in search because I am too lazy to write my own, and it is implemented in native code, and so is screaming fast compared to even the best Python solutions.

from bisect import bisect_left
class Solution:
def index(a, x):
'Locate the leftmost value exactly equal to x'
 i = bisect_left(a, x)
 if i != len(a) and a[i] == x:
 return i
 raise ValueError
# @return a tuple, (index1, index2)
def twoSum(self, num, target):
 sorted_nums = sorted(num)
 for first_num in sorted_nums[:-1]:
 second_num = target - first_num
 first_index = index(num, first_num)
 second_index = 0
 try:
 if first_num == second_num:
 second_index = index(num[first_index + 1:], second_num) + first_index + 1
 else:
 second_index = num.index(second_num)
 except ValueError:
 continue
 return sorted((first_index + 1, second_index + 1))

Note that your code is terribly memory inefficient by creating so many tuples that are left unused. You should only create one at the end, as in my code.

Some may question my use of exceptions, but from a performance standpoint, it is much faster than iterating through the list searching for the desired element. It is also considered "Pythonic" as per the "it is better to ask for forgiveness than to ask for permission" motto.

Also, it may just be my preferences (and my Objective C experience), but I prefer longer more descriptive names, as it is easier to understand.

Question 8

index is a method that works for all arrays even if they are not sorted. This means it doesn't take advantage of binary search, it just traverses the array until it finds what it was looking for. If you want to use the standard library, have a look at the bisect module.

Question 9

Updated code to take advantage of bisect. Thanks for the tip! The lists in the original problem were small, so .index() vs binary search performance was nearly irrelevant.

Question 10

The binary-search strategy is great for finding just one number. However, in this case, we're repeatedly finding numbers, and those numbers are related to each other - as we walk through the sorted list, i increases, and so target - i decreases.

We can make use of this property by walking from both ends of the list with two indexes (let's call them left and right). When s_num[left] + s_num[right] is bigger than target, decrement right; when it's smaller, increment left. If left becomes equal to right, then we have completed the search, and found no match.

This part of the algorithm obviously scales as O(n) - it will eventually be dominated by the sort, which is O(n log n).

def two_sum(num, target):
 s_num = sorted(num)
 left = 0
 right = len(s_num) - 1
 while left < right:
 sum = s_num[left] + s_num[right]
 if sum < target:
 left += 1
 elif sum > target:
 right -= 1
 else: # sum == target:
 return s_num[left], s_num[right]
 return None

vnp vnp 58.7k4 gold badges55 silver badges144 bronze badges · Accepted Answer · 2014-07-16 23:17:24Z

You are correct that the first step is to sort the array. The second step - an actual search for solution - is also happen to be of n*log(n) complexity. What is important, they are very different n*log(n): sorting is implemented in native code, so it is virtually instantaneous comparing to the Python loops. The key to solution is to realize that the second step can be done in linear time:

Given a sorted array data you may build another sorted array target_minus_data in linear time. It is also sorted, descending. Reverse it (linear time). Now there are two sorted arrays sharing the same value. It can be found also in linear time by a merge-like algorithm.

Of course, you don't have to physically reverse target_minus_data - just iterate it backwards. If you look closely, you do not even have to build it; everything can be done in the fly. Merge data with itself, from both ends, a backward iterator returning target - data[bkwd].

Stack Exchange Network

Improving efficiency for two sum

6 Answers 6

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Improving efficiency for two sum

6 Answers 6

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions