Leetcode two sum

Question 1

I'm currently learning c++ coming from a python background, so I'll include a solution in python and in c++ for the following problem statement:

Given an array of integers nums and an integer target, return indices of the two numbers such that they add up to target. You may assume that each input would have exactly one solution, and you may not use the same element twice. You can return the answer in any order.

Example 1:

Input: nums = [2,7,11,15], target = 9

Output: [0,1]

Example 2:

Input: nums = [3,2,4], target = 6

Output: [1,2]

I would like to hear your feedback / suggestions for performance improvements / other suggestions. Here's the link

two_sum.py

def two_sum(nums: list, target: int):
 for i, n in enumerate(nums):
 match = target - n
 if match in (rest := nums[i + 1:]):
 match_at = rest.index(match)
 return i, match_at + i + 1
if __name__ == '__main__':
 if result := two_sum([2, 7, 11, 15], 22):
 print(f'Indices:\n{result}')
 else:
 print('No matches found')

Leetcode stats:

Runtime: 772 ms, faster than 36.98% of Python online submissions for Two Sum. Memory Usage: 14.4 MB, less than 49.82% of Python online submissions for Two Sum.

two_sum.h

#ifndef LEETCODE_TWO_SUM_H
#define LEETCODE_TWO_SUM_H
#include <iostream>
#include <vector>
using std::vector;
using std::cout;
using std::endl;
vector<int> two_sum_solution(vector<int> &nums, int target) {
 vector <int> results;
 for (int i = 0; i < nums.size(); ++i) {
 int match = target - nums[i];
 for (int j = i + 1; j < nums.size(); ++j) {
 if (nums[j] == match) {
 for (int index_match : {
 i, j
 })
 results.push_back(index_match);
 }
 }
 }
 return results;
}
#endif //LEETCODE_TWO_SUM_H

main.cpp

#include <vector>
#include "two_sum.h"
using std::vector;
int main() {
 vector<int> v1{2, 7, 11, 15};
 vector<int> v = two_sum_solution(v1, 22);
 if (!v.empty()) {
 cout << "Indices:" << endl;
 for (auto i: v)
 cout << i << " ";
 }
 else (cout << "No matches found");
}

Leetcode stats:

Runtime: 384 ms, faster than 34.03% of C++ online submissions for Two Sum. Memory Usage: 9.3 MB, less than 12.99% of C++ online submissions for Two Sum.

Question 2

I am not an expert in C++ but I can give a feedback about the Python solution.

Your current solution runs in \$O(n^2)\$. Basically, for each number n of the input nums, find target - n in nums. How to improved it?

The second part of the algorithm can be improved from \$O(n)\$ to \$O(1)\$. Instead of looking up target - n in a list, you can use a dictionary:

def two_sum(nums: list, target: int):
 num_index = {}
 for i, n in enumerate(nums):
 match = target - n
 if match in num_index:
 return num_index[match], i
 num_index[n] = i
 return -1

Results:

Original: Runtime: 772 ms. Memory Usage: 14.4 MB
Improved: Runtime: 48 ms. Memory Usage: 15.5 MB

Question 3

If performance is of importance, I would suggest using get() to avoid duplicate dict-lookup. That is, replace match = target - n with match = num_index.get(target - n) , and then replace the if-statement with if match is not None: , and finally, return match, i. I believe there might be some gain here, if the nums param is a rather long list. Please let me know if you disagree or prove me wrong. Cheers.

Question 4

@magnus I think that with your suggestion there should be a little improvement but LeetCode gives similar results. Maybe with a proper benchmark would be possible to notice the difference.

Question 5

@magnus So you think making every check slower just to avoid one lookup will make it overall faster?

Question 6

@superbrain Yes, I believed so, at least for (very) large dicts. However, according to this SO post, it is not: stackoverflow.com/questions/39582115/… . My assumption was based on the accepted answer of this CR post: codereview.stackexchange.com/questions/231409

Question 7

@magnus Hmm, you said it's to avoid duplicate lookup, though. So the "for large list/dict" doesn't make sense, as the larger they get, the less that single avoided duplicate lookup matters.

Question 8

Only include the header files that you need

In your two_sum.h file, you don't need iostream, since you're not using any of its functionality. Remember that #include literally copy-pastes the file, so if you're including this header file in multiple files, it might potentially slow down your compilation times.

Split declarations and definitions

Typically, you would split your files into two parts: the header file (normally ending with *.h, *.hpp, *.hh) and the source file (normally ending with *.cpp, *.cc). The header file only consists of the declarations and the source file contains the implementation.

So in your case, your header file will look like this:

two_sum.h

#ifndef LEETCODE_TWO_SUM_H
#define LEETCODE_TWO_SUM_H
#include <vector>
std::vector<int> two_sum_solution(std::vector<int> &nums, int target);
#endif // LEETCODE_TWO_SUM_H

and your source file will look like this:

two_sum.cpp

#include "two_sum.h"
std::vector<int> two_sum_solution(std::vector<int> &nums, int target)
{
 ...
}

In fact, if you try to include your two_sum.h (with the implementation) into multiple files, you would be breaking the One-Definition Rule. Your source files would contain multiple definitions of the same function, and the linker will spit out an error. One way to get around is to mark the functions inline, but you most likely want to do the former.

No `using namespace` in the header files

Don't do using namespace or any of its variant in a header file. Since the header file is copy pasted throughout multiple source files, it has a potential to cause annoying errors. See here

Use const reference

Since two_sum_solution is not modifying the nums vector, pass it by const reference.

size_t vs int for array indices

Consider using size_t instead of int for array indices

Use `auto` as much as possible

There are a couple of instances in your code where you can use auto instead of specifying the type. Examples:

auto match = target - nums[i]; auto v = two_sum_solution(v1, 22);

The inner-most loop is pointless

Simply do

results.push_back(i);
results.push_back(j);

Also, once you've found the solution, you might want to return the result immediately.

Question 9

Thanks, that's very informative, i have a few questions: what if i'm going to include the header file that will only contain declarations, in which .cpp file should the definition be given that i need to include it in several files? And regarding using const, auto and size_t whenever relevant, does this improve performance? I'm asking because as you must know, in python these things do not exist so i'm learning the hows and whys

Question 10

Your header file should be included in all the source files that will want to your functions (for e.g. your main.cpp will include two_sum.h). Your functions should be defined in a separate source file (two_sum.cpp). During compilation, the linker will resolve all references. They don't improve performance. Since you're passing by reference, it makes sense to use const so you don't do anything that will accidentally modify the vector. auto is just a directive for the compiler to deduce the type, and size_t is the canonical type for array indices.

Question 11

You can perhaps improve the performance by creating a map of value -> index in the first iteration over the given array.

Currently, Your program does the following (time complexity):

iterate over all index, value pairs of the array (\$ O(n) \$)
search for target - value in the array (\$ O(n) \$)
lookup index of target - value (\$ O(n) \$)

And since these are all nested, you get to \$ O(n^2) \$ (it isn't \$ n^3 \$ because last lookup is not being done for each iteration).

My proposed solution:

Create a map/dict of {value: index} (\$ O(n) \$)
Iterate over index, value of array (\$ O(n) \$)
Lookup and return index from the map/dict (\$ O(1) \$)

def two_sum(numbers: list[int], target: int):
 lookup: dict = {
 value: index
 for index, value in enumerate(numbers)
 }
 for index, value in enumerate(numbers):
 match = target - value
 if search_index := lookup.get(match):
 return index, search_index
 return None

Question 12

That would fail the cases where the target is double the number ex: [1,3,4,2] and 6, will output (1, 1) which is incorrect

Question 13

Adding another if-clause to verify search_index != index is still faster than iterating over the array again.

Question 14

@HeapOverflow I was referring to the iteration in OP's code: match in (rest := nums[i + 1:]) or int j = i + 1; j < nums.size(); ++j

Question 15

@HeapOverflow Marc's rewrite is obviously better. Although it uses the same style as my proposed solution above. Mine was a verbatim iteration of the 3 points.

Question 16

You can combine the two loops into one, which has the additional advantage of terminating as soon as the second item is found.

Question 17

This is interesting to me because I come from a C background and started using Python the past few years for work, so I've had the reverse path as you. When I started Python, I greatly preferred solutions like yours because looping through lists is so explicit and clear.

However, I since learned that more proficient Python programmers at work understand my code better when I use the standard library. Once I began to invest in learning those tools, it had the double effect of 1) making my code more succinct and 2) being more efficient in time and/or space.

In this case, I would solve the problem with combinations from the itertools package:

from itertools import combinations
def two_sum(nums, target):
 pairs_with_indices = combinations(enumerate(nums), 2)
 # result is a generator comprehension.
 winning_pairs = ((index_i, index_j)
 for (index_i, i), (index_j, j) in pairs_with_indices
 if sum((i, j)) == target)
 # Insert as much error checking as you need...
 return next(winning_pairs)

There's probably an even better more succinct and clear solution using Numpy, which is effectively standard library in my line of work (data science) but that's not true everywhere.

One thing that's different than your code: there is no room for off-by-one-errors. In my experience, code like this

if match in (rest := nums[i + 1:]):
 match_at = rest.index(match)
 return i, match_at + i + 1

is easy for me to write, hard to read and maintainability spans the whole gambit from easy to impossible. In other words, managing indices manually in Python gives me just enough rope to hang myself with, and standard library functions have been a great alternative.

Question 18

I tried the very same solution at first and despite itertools being very efficient, the solution for some reason is rejected "time limit exceeded" but it's what i think is the most readable solution but may not scale well because it's O(N^2)

Question 19

@bullseye, Good point. I made the list comprehension a generator comprehension and only return the first value, so we don't continue calculating after we find something. Out of curiosity, could you share the input nums/target so I could try the time test?

Question 20

The input nums are already provided on the leetcode platform, if you want you may test there directly. The examples were Input: nums = [2,7,11,15], target = 9 and Input: nums = [3,2,4], target = 6

Question 21

This adds unnecessary complexity to the problem. A dictionary would be a much more useful tool here.

Question 22

Know your containers

std::unordered_map is your friend in this problem. Whenever you've never previously seen a number, simply use the operator[] or insert function to add the number and its index. When using find, it will return an iterator, which is a key-value pair.

eg: auto location = m.find(numToFind);

location->first is your key, and location->second is your value

When you return, don't use push_back

You can simply return an initializer list like: {i,j}.

Marc Marc 5,7342 gold badges15 silver badges35 bronze badges · Accepted Answer · 2020-11-01 07:07:26Z

11

\$\begingroup\$

I am not an expert in C++ but I can give a feedback about the Python solution.

Your current solution runs in \$O(n^2)\$. Basically, for each number n of the input nums, find target - n in nums. How to improved it?

The second part of the algorithm can be improved from \$O(n)\$ to \$O(1)\$. Instead of looking up target - n in a list, you can use a dictionary:

def two_sum(nums: list, target: int):
 num_index = {}
 for i, n in enumerate(nums):
 match = target - n
 if match in num_index:
 return num_index[match], i
 num_index[n] = i
 return -1

Results:

Original: Runtime: 772 ms. Memory Usage: 14.4 MB
Improved: Runtime: 48 ms. Memory Usage: 15.5 MB

Share

edited Nov 5, 2020 at 5:17

answered Nov 1, 2020 at 7:07

Marc's user avatar

Marc Marc

5,7342 gold badges15 silver badges35 bronze badges

\$\endgroup\$

9

\$\begingroup\$ If performance is of importance, I would suggest using get() to avoid duplicate dict-lookup. That is, replace match = target - n with match = num_index.get(target - n) , and then replace the if-statement with if match is not None: , and finally, return match, i. I believe there might be some gain here, if the nums param is a rather long list. Please let me know if you disagree or prove me wrong. Cheers. \$\endgroup\$

Wololo
– Wololo

2020年11月02日 12:58:35 +00:00
Commented Nov 2, 2020 at 12:58
\$\begingroup\$ @magnus I think that with your suggestion there should be a little improvement but LeetCode gives similar results. Maybe with a proper benchmark would be possible to notice the difference. \$\endgroup\$

Marc
– Marc

2020年11月02日 14:28:58 +00:00
Commented Nov 2, 2020 at 14:28
\$\begingroup\$ @magnus So you think making every check slower just to avoid one lookup will make it overall faster? \$\endgroup\$

superb rain
– superb rain

2020年11月05日 08:34:35 +00:00
Commented Nov 5, 2020 at 8:34
\$\begingroup\$ @superbrain Yes, I believed so, at least for (very) large dicts. However, according to this SO post, it is not: stackoverflow.com/questions/39582115/… . My assumption was based on the accepted answer of this CR post: codereview.stackexchange.com/questions/231409 \$\endgroup\$

Wololo
– Wololo

2020年11月05日 09:17:25 +00:00
Commented Nov 5, 2020 at 9:17
\$\begingroup\$ @magnus Hmm, you said it's to avoid duplicate lookup, though. So the "for large list/dict" doesn't make sense, as the larger they get, the less that single avoided duplicate lookup matters. \$\endgroup\$

superb rain
– superb rain

2020年11月05日 11:42:44 +00:00
Commented Nov 5, 2020 at 11:42

| Show 4 more comments

Stack Exchange Network

Leetcode two sum

5 Answers 5

Only include the header files that you need

Split declarations and definitions

two_sum.h

two_sum.cpp

No `using namespace` in the header files

Use const reference

size_t vs int for array indices

Use `auto` as much as possible

The inner-most loop is pointless

Know your containers

When you return, don't use push_back

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Leetcode two sum

5 Answers 5

Only include the header files that you need

Split declarations and definitions

two_sum.h

two_sum.cpp

No using namespace in the header files

Use const reference

size_t vs int for array indices

Use auto as much as possible

The inner-most loop is pointless

Know your containers

When you return, don't use push_back

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions

No `using namespace` in the header files

Use `auto` as much as possible