Working on the 2 element subset sum problem (finding two numbers in a list that sum to a given value), I came up with a relatively short solution code-wise.
The code could fit on a single line but I'm not super familiar with python so is this is a naive way to solve/implement it?
How efficient is creating a set and intersecting it with a list vs using a loop?
l = [10, 7, 11, 3, 5, 11]
k = 22
#create a set of the differences between K and L including duplicate elements that sum to K
diff = {k-x for i, x in enumerate(l) if k != 2*x or x in l[i+1:len(l)]}
#Return true if an element of Diff is in L
print(bool(diff.intersection(l)))
2 Answers 2
Usually Python is not terribly fast when it comes to "hand-written" loops. So my prediction (also supported by some preliminary profiling using timeit, see below) would be that a "hand-written" loop is slower than the C loop used in the implementation of set.intersection
. The effect should/will become more significant for larger lists.
A quick non-performance related note: you can substitute l[i+1:len(l)]
by l[i+1:]
. This is because the end of a slice defaults to the end of the sequence.
import timeit
SETUP = """
l = [10, 7, 11, 3, 5, 11]
k = 22
diff = {k-x for i, x in enumerate(l) if k != 2*x or x in l[i+1:]}
"""
print(
sum(timeit.repeat("bool(diff.intersection(l))", setup=SETUP,
repeat=10, number=100000)) / 10
)
print(
sum(timeit.repeat("bool([i for i in l if i in diff])", setup=SETUP,
repeat=10, number=100000)) / 10
)
#create a set of the differences between K and L including duplicate elements that sum to K diff = {k-x for i, x in enumerate(l) if k != 2*x or x in l[i+1:len(l)]}
The slice can be simplified:
diff = {k-x for i, x in enumerate(l) if k != 2*x or x in l[i+1:]}
In my opinion, the or x in l[i+1:]
is a bit too tricky to use without further explanation. My immediate reaction on seeing it was that it undermined the whole point of using a set by creating a quadratic worst case. On further reflection I see that (assuming Python is sensibly implemented, which I think is a safe assumption here) it doesn't, but it's better to preempt the doubt with a comment.
Alternatively, handle the special case separately by checking whether k
is even and if so counting instances of k // 2
.
As an optimisation, you can keep only the half of each pair in diff
:
diff = {k-x for x in l if 2*x < k}
#Return true if an element of Diff is in L print(bool(diff.intersection(l)))
This is a bit cryptic. I think it might be more pythonic to use any
:
print(any(x in l if x in diff))
That's also potentially faster, because it can abort on finding the first solution.
-
\$\begingroup\$
i
is used in the second condition in the original set construction so it cannot be dropped so easily at this stage in the optimization. \$\endgroup\$AlexV– AlexV2019年05月04日 14:29:18 +00:00Commented May 4, 2019 at 14:29