Python set based solution to Subset Sum

Question 1

Working on the 2 element subset sum problem (finding two numbers in a list that sum to a given value), I came up with a relatively short solution code-wise.

The code could fit on a single line but I'm not super familiar with python so is this is a naive way to solve/implement it?

How efficient is creating a set and intersecting it with a list vs using a loop?

l = [10, 7, 11, 3, 5, 11] 
k = 22
#create a set of the differences between K and L including duplicate elements that sum to K
diff = {k-x for i, x in enumerate(l) if k != 2*x or x in l[i+1:len(l)]}
#Return true if an element of Diff is in L
print(bool(diff.intersection(l)))

Question 2

Usually Python is not terribly fast when it comes to "hand-written" loops. So my prediction (also supported by some preliminary profiling using timeit, see below) would be that a "hand-written" loop is slower than the C loop used in the implementation of set.intersection. The effect should/will become more significant for larger lists.

A quick non-performance related note: you can substitute l[i+1:len(l)] by l[i+1:]. This is because the end of a slice defaults to the end of the sequence.

import timeit
SETUP = """
l = [10, 7, 11, 3, 5, 11]
k = 22
diff = {k-x for i, x in enumerate(l) if k != 2*x or x in l[i+1:]}
"""
print(
 sum(timeit.repeat("bool(diff.intersection(l))", setup=SETUP,
 repeat=10, number=100000)) / 10
)
print(
 sum(timeit.repeat("bool([i for i in l if i in diff])", setup=SETUP,
 repeat=10, number=100000)) / 10
)

Question 3

#create a set of the differences between K and L including duplicate elements that sum to K
diff = {k-x for i, x in enumerate(l) if k != 2*x or x in l[i+1:len(l)]}

The slice can be simplified:

diff = {k-x for i, x in enumerate(l) if k != 2*x or x in l[i+1:]}

In my opinion, the or x in l[i+1:] is a bit too tricky to use without further explanation. My immediate reaction on seeing it was that it undermined the whole point of using a set by creating a quadratic worst case. On further reflection I see that (assuming Python is sensibly implemented, which I think is a safe assumption here) it doesn't, but it's better to preempt the doubt with a comment.

Alternatively, handle the special case separately by checking whether k is even and if so counting instances of k // 2.

As an optimisation, you can keep only the half of each pair in diff:

diff = {k-x for x in l if 2*x < k}

#Return true if an element of Diff is in L
print(bool(diff.intersection(l)))

This is a bit cryptic. I think it might be more pythonic to use any:

print(any(x in l if x in diff))

That's also potentially faster, because it can abort on finding the first solution.

Question 4

i is used in the second condition in the original set construction so it cannot be dropped so easily at this stage in the optimization.

AlexV AlexVAlexV 7,3532 gold badges24 silver badges47 bronze badges · Answer 1 · 2019-05-04 13:52:42Z

Usually Python is not terribly fast when it comes to "hand-written" loops. So my prediction (also supported by some preliminary profiling using timeit, see below) would be that a "hand-written" loop is slower than the C loop used in the implementation of set.intersection. The effect should/will become more significant for larger lists.

A quick non-performance related note: you can substitute l[i+1:len(l)] by l[i+1:]. This is because the end of a slice defaults to the end of the sequence.

import timeit
SETUP = """
l = [10, 7, 11, 3, 5, 11]
k = 22
diff = {k-x for i, x in enumerate(l) if k != 2*x or x in l[i+1:]}
"""
print(
 sum(timeit.repeat("bool(diff.intersection(l))", setup=SETUP,
 repeat=10, number=100000)) / 10
)
print(
 sum(timeit.repeat("bool([i for i in l if i in diff])", setup=SETUP,
 repeat=10, number=100000)) / 10
)

score 1 · Answer 2 · 2019-05-04 14:21:06Z

#create a set of the differences between K and L including duplicate elements that sum to K
diff = {k-x for i, x in enumerate(l) if k != 2*x or x in l[i+1:len(l)]}

The slice can be simplified:

diff = {k-x for i, x in enumerate(l) if k != 2*x or x in l[i+1:]}

In my opinion, the or x in l[i+1:] is a bit too tricky to use without further explanation. My immediate reaction on seeing it was that it undermined the whole point of using a set by creating a quadratic worst case. On further reflection I see that (assuming Python is sensibly implemented, which I think is a safe assumption here) it doesn't, but it's better to preempt the doubt with a comment.

Alternatively, handle the special case separately by checking whether k is even and if so counting instances of k // 2.

As an optimisation, you can keep only the half of each pair in diff:

diff = {k-x for x in l if 2*x < k}

#Return true if an element of Diff is in L
print(bool(diff.intersection(l)))

This is a bit cryptic. I think it might be more pythonic to use any:

print(any(x in l if x in diff))

That's also potentially faster, because it can abort on finding the first solution.

i is used in the second condition in the original set construction so it cannot be dropped so easily at this stage in the optimization.

Stack Exchange Network

Python set based solution to Subset Sum

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Python set based solution to Subset Sum

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions