A One-Pass Hash Table Solution to twoSum

Question 1

I read the guiding solution to twoSum in leetcodes ;

Given an array of integers, return indices of the two numbers such that they add up to a specific target.

You may assume that each input would have exactly one solution, and you may not use the same element twice.

Example:
Given nums = [2, 7, 11, 15], target = 9,
Because nums[0] + nums[1] = 2 + 7 = 9,
return [0, 1].
Approach 3: One-pass Hash Table
It turns out we can do it in one-pass. While we iterate and inserting elements into the table, we also look back to check if current element's complement already exists in the table. If it exists, we have found a solution and return immediately.

and mimic a python solution

class Solution:
 def twoSum(self, nums, target) -> List[int]:
 """
 :type nums: List[int]
 :type target: int
 """
 nums_d = {}
 for i in range(len(nums)):
 complement = target - nums[i]
 if nums_d.get(complement) != None: #Check None not True Value
 return [i, nums_d.get(complement)] 
 nums_d[nums[i]] = i #produce a map 
 return []

Unfortunately, my solution is only faster than 81%, which I thought was a best solution.

Runtime: 40 ms, faster than 81.00% of Python3 online submissions for Two Sum. Memory Usage: 14.3 MB, less than 5.08% of Python3 online submissions for Two Sum. Next challenges:

How could continue to improve the code, and I am curious about the approaches of the top 20%.

Question 2

Don't leave white space at the end of lines.
Use enumerate.
When comparing with None use is and is not.
It'd be cleaner to just use in rather than get.
If you want to use nums_d.get then you should use it outside the if so you don't have to use it a second time when you enter the if.
This however makes the code messy for not much of a benefit IMO.
Unless the site forces you to return lists returning a tuple would be more Pythonic.
Your comments aren't helpful, if anything they make the code harder to read for me.
The variable name nums is easier to read then nums_d, the _d is useless.
When returning it would be better to either:
- Raise an exception, as the lookup failed.
- Return a tuple where both values are None. This is so you can tuple unpack without errors.

Getting the code:

def test_peil(nums: List[int], target: int) -> Tuple[int, ...]:
 lookup = {}
 for i, v in enumerate(nums):
 if target - v in lookup:
 return i, lookup[target - v]
 lookup[v] = i
 raise ValueError('Two sums target not in list.')

You can further improve performance by including Alain T.'s change for small numbers.

def test_peil_alain(nums: List[int], target: int) -> Tuple[int, ...]:
 if len(nums) <= 100:
 for i, n in enumerate(nums):
 if target - n in nums:
 return i, nums.index(target - n)
 else:
 lookup = {}
 for i, v in enumerate(nums):
 if target - v in lookup:
 return i, lookup[target - v]
 lookup[v] = i
 raise ValueError('Two sums target not in list.')

And has a performance improvement: (the graph is volatile due to only using one input sample)

enter image description here

Question 3

if use ` if target - v in lookup`, is it a O(n) search through the list of lookup.keys()?

Question 4

@Alice No it's amortized \$O(1)\$.

Question 5

amazing, would you mind if I ask for the name of your visual tool to analyze time complexity?

Question 6

@JawSaw It's not on PyPI yet. But it's it's on CR and GitHub.

Question 7

Your code looks like Java. In order to get good at Python, you'll have to give up some of the Java-isms. This code didn't need to be in a class (unless there was some external requirement you didn't show us). A function would have been fine.

Let's start with the iteration. Ned Batchelder gave a talk called Loop Like a Native that contains a lot of great advice for writing looping structures in Python. You should watch that for a lot of pro-tips on basic stuff.

I'm also going to suggest that you either use type annotations, or don't use type annotations. Don't put in some annotations and then leave other type information in the docblock.

So with just those three suggestions, we can change your code to:

def two_sum(array: Sequence[int], target: int) -> List[int]:
 ''' Given an array of integers, return indices of the two numbers such
 that they add up to a specific target.
 '''
 nums_d = {}
 for i, val in enumerate(nums):
 ... as before ...

Next is your choice of how to check if the complement has been seen before. You are calling the .get() method twice, once for the check and if successful you call it again. This is minor, since you'll only succeed one time. But calling the .get() method and then comparing it will None is more expensive than you may think. Try using the in binary operator instead. And while you're at it, just use dict[key] notation if you know the key is in the dictionary:

 for i, val in enumerate(nums):
 complement = target - val
 if complement in nums_d:
 return [i, nums_d[complement]]
 return []

It's hard to predict performance improvements. But I think that "less is more", and in this case less opcodes is more fast. Here's what the dis.dis output looks like for your original code (starting after the for line):

 >> 18 FOR_ITER 56 (to 76)
 20 STORE_FAST 4 (i)
 10 22 LOAD_FAST 2 (target)
 24 LOAD_FAST 1 (nums)
 26 LOAD_FAST 4 (i)
 28 BINARY_SUBSCR
 30 BINARY_SUBTRACT
 32 STORE_FAST 5 (complement)
 12 34 LOAD_FAST 3 (nums_d)
 36 LOAD_METHOD 2 (get)
 38 LOAD_FAST 5 (complement)
 40 CALL_METHOD 1
 42 LOAD_CONST 1 (None)
 44 COMPARE_OP 3 (!=)
 46 POP_JUMP_IF_FALSE 62

And here's what the changed code produces:

 >> 14 FOR_ITER 44 (to 60)
 16 UNPACK_SEQUENCE 2
 18 STORE_FAST 3 (i)
 20 STORE_FAST 4 (val)
 21 22 LOAD_FAST 1 (target)
 24 LOAD_FAST 4 (val)
 26 BINARY_SUBTRACT
 28 STORE_FAST 5 (complement)
 23 30 LOAD_FAST 5 (complement)
 32 LOAD_FAST 2 (nums_d)
 34 COMPARE_OP 6 (in)
 36 POP_JUMP_IF_FALSE 50

The third column (lots of numbers) is byte offset from the start of the code, so you can see the new version takes 6 fewer bytes to run, out of 28 bytes. That's about 21% less.

Does this mean the code will run 21% faster? No. There's lots of other code in there that doesn't change. But it means it probably runs some faster -- you'll have to determine how much.

Question 8

What I've seen here hits the dict twice per iteration.
If I was into this, I guess I'd try to hit the dictionary just once a number:

How about using as a key neither the current number nor its complement, but their absolute difference?
Getting the index via found = seen.setdefault(diff, value), you'd still have to check whether

found was the current index
the element at found was the complement and not the current element (special casing target / 2).
(At second glance, one index for each difference is indeed enough...)

Drawing heavily on solutions presented here (my pythons don't even like annotated function definitions?!):

def two_sum(numbers: Sequence[number], target: number) -> Sequence[int]:
 ''' Given a sequence of numbers,
 return the first pair of indices of two numbers adding up to target.
 '''
 seen = dict() # keys will be absolute differences between value and complement
 for i, val in enumerate(numbers):
 complement = target - val
 diff = abs(complement - val)
 found = seen.setdefault(diff, i)
 # print(complement, i, found)
 if found != i:
 if complement == val or numbers[found] != val:
 return found, i
 return ()

(actually, the results don't always live up to how I read the doc string:
while I hope that it returns the lowest index of the summand found last, I can see it returning the highest (lower) index of the other one. Resolution left...)

Question 9

(Belatedly noticed a very similar question reading return indices of the two numbers as all indices - bummer.)

Question 10

The overhead of creating a dictionary makes the solution less efficient for small lists.

This sequential alternative:

for i,n in enumerate(nums):
 if target-n in nums:
 return (i,nums.index(target-n))

is about twice as fast in the best case scenarios (a small list with a match in the lower indexes) but it will lose out when there are more items in the list.

A more concise way to write the dictionary based approach (which happens to also run faster) would be this:

match = dict()
for i,n in enumerate(nums):
 if n in match: return (i,match[n])
 match[target-n] = i

You can get further acceleration (for large lists) by filtering on the maximum range of eligible numbers:

maxVal = target-min(nums)
minVal = target-max(nums)
match = dict()
for i,n in enumerate(nums):
 if n < minVal or n > maxVal: continue
 if n in match: return (i,match[n])
 match[target-n] = i

note that this is going to be data dependent but on average it should provide an improvement

Question 11

Your 'Pythonic' code is the complete opposite of what I'd call 'Pythonic'.

Question 12

The first part of your answer is correct in len(nums) < 100. I've added your code and a merge of both my answer and your answer together. I'm happy to remove my downvote if you fix the second half of your answer. :)

Peilonrayz ♦Peilonrayz 44.4k7 gold badges80 silver badges157 bronze badges · Accepted Answer · 2019-03-23 01:51:45Z

Don't leave white space at the end of lines.
Use enumerate.
When comparing with None use is and is not.
It'd be cleaner to just use in rather than get.
If you want to use nums_d.get then you should use it outside the if so you don't have to use it a second time when you enter the if.
This however makes the code messy for not much of a benefit IMO.
Unless the site forces you to return lists returning a tuple would be more Pythonic.
Your comments aren't helpful, if anything they make the code harder to read for me.
The variable name nums is easier to read then nums_d, the _d is useless.
When returning it would be better to either:
- Raise an exception, as the lookup failed.
- Return a tuple where both values are None. This is so you can tuple unpack without errors.

Getting the code:

def test_peil(nums: List[int], target: int) -> Tuple[int, ...]:
 lookup = {}
 for i, v in enumerate(nums):
 if target - v in lookup:
 return i, lookup[target - v]
 lookup[v] = i
 raise ValueError('Two sums target not in list.')

You can further improve performance by including Alain T.'s change for small numbers.

def test_peil_alain(nums: List[int], target: int) -> Tuple[int, ...]:
 if len(nums) <= 100:
 for i, n in enumerate(nums):
 if target - n in nums:
 return i, nums.index(target - n)
 else:
 lookup = {}
 for i, v in enumerate(nums):
 if target - v in lookup:
 return i, lookup[target - v]
 lookup[v] = i
 raise ValueError('Two sums target not in list.')

And has a performance improvement: (the graph is volatile due to only using one input sample)

enter image description here

if use ` if target - v in lookup`, is it a O(n) search through the list of lookup.keys()?
amazing, would you mind if I ask for the name of your visual tool to analyze time complexity?
@JawSaw It's not on PyPI yet. But it's it's on CR and GitHub.

Stack Exchange Network

A One-Pass Hash Table Solution to twoSum

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

A One-Pass Hash Table Solution to twoSum

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions