6
\$\begingroup\$

The idea is to apply some function on each element in each list, then compare two lists by the value returned by the function.

My current solution works but is not fast enough. running "python -m cProfile" gives sth. like:

 ncalls tottime percall cumtime percall filename:lineno(function)
 2412505 13.335 0.000 23.633 0.000 common.py:38(<genexpr>)
 285000 5.434 0.000 29.067 0.000 common.py:37(to_dict)
 142500 3.392 0.000 35.948 0.000 common.py:3(compare_lists)

Here is my code, I would like to know how to optimize it to run faster.

import itertools
def compare_lists(li1, li2, value_func1=None, value_func2=None):
 """ Compare *li1* and *li2*, return the results as a list in the following form:
 [[data seen in both lists], [data only seen in li1], [data only seen in li2]]
 and [data seen in both lists] contains 2 tuple: [(actual items in li1), (actual items in li2)]
 * *value_func1* callback function to li1, applied to each item in the list, returning the **logical** value for comparison
 * *value_func2* callback function to li2, similarly
 If not supplied, lists will be compared as it is.
 Usage::
 >>> compare_lists([1, 2, 3], [1, 3, 5])
 >>> ([(1, 3), (1, 3)], [2], [5])
 Or with callback functions specified::
 >>> f = lambda x: x['v']
 >>>
 >>> li1 = [{'v': 1}, {'v': 2}, {'v': 3}]
 >>> li2 = [1, 3, 5]
 >>>
 >>> compare_lists(li1, li2, value_func1=f)
 >>> ([({'v': 1}, {'v': 3}), (1, 3)], [{'v': 2}], [5])
 """
 if not value_func1:
 value_func1 = (lambda x:x)
 if not value_func2:
 value_func2 = (lambda x:x)
 def to_dict(li, vfunc):
 return dict((k, list(g)) for k, g in itertools.groupby(li, vfunc))
 def flatten(li):
 return reduce(list.__add__, li) if li else []
 d1 = to_dict(li1, value_func1)
 d2 = to_dict(li2, value_func2)
 if d1 == d2:
 return (li1, li2), [], []
 k1 = set(d1.keys())
 k2 = set(d2.keys())
 elems_left = flatten([d1[k] for k in k1 - k2])
 elems_right = flatten([d2[k] for k in k2 - k1])
 common_keys = k1 & k2
 elems_both = flatten([d1[k] for k in common_keys]), flatten([d2[k] for k in common_keys])
 return elems_both, elems_left, elems_right

Edit:

zeekay suggests using set, which is also what I was doing, except that I make a dict for each list first, then compare the keys using set, finally return the original elements using the dict. I realized that the speed actually depends on which one will take more time -- the callback function, or the groupby. In my case, the possible callback functions are mostly dot access on objects, and the length of lists can be large causing groupby on lists takes more time.

In the improved version each callback function is executed more than once on every single element, which I considered is a waste and has been trying to avoid in the first place, but it's still much faster than my original approach, and much simpler.

def compare_lists(li1, li2, vf1=None, vf2=None):
 l1 = map(vf1, li1) if vf1 else li1
 l2 = map(vf2, li2) if vf2 else li2
 s1, s2 = set(l1), set(l2)
 both, left, right = s1 & s2, s1 - s2, s2 - s1
 orig_both = list((x for x in li1 if vf1(x) in both) if vf1 else both), list((x for x in li2 if vf2(x) in both) if vf2 else both)
 orig_left = list((x for x in li1 if vf1(x) in left) if vf1 else left)
 orig_right = list((x for x in li2 if vf2(x) in right) if vf2 else right)
 return orig_both, orig_left, orig_right
asked Sep 12, 2011 at 8:46
\$\endgroup\$
6
  • \$\begingroup\$ there are smarter ways to flatten a list, using sum, for other comparison approaches see stackoverflow.com/questions/1388818/… \$\endgroup\$ Commented Sep 12, 2011 at 8:58
  • \$\begingroup\$ @Fredrik Thanks. sum does not work for list of lists though... sum([1, 2], [3]) raises TypeError. \$\endgroup\$ Commented Sep 12, 2011 at 9:26
  • 1
    \$\begingroup\$ @Shaung it does if you use it like sum([[],[1],[2,3]], []), note the 2nd argument []. \$\endgroup\$ Commented Sep 12, 2011 at 9:39
  • \$\begingroup\$ he would be referring to sum([[1, 2], [3]], []) \$\endgroup\$ Commented Sep 12, 2011 at 9:39
  • \$\begingroup\$ @Dan D... snap! \$\endgroup\$ Commented Sep 12, 2011 at 9:40

3 Answers 3

3
\$\begingroup\$

So it seems you primarily want to use your callback functions to pull the value to compare out of an object, if you don't mind simplifying how compared items are returned, a faster and simpler approach would be:

def simple_compare(li1, li2, value_func1=None, value_func2=None):
 s1, s2 = set(map(value_func1, li1)), set(map(value_func2, li2))
 common = s1.intersection(s2)
 s1_diff, s2_diff = s1.difference(s2), s2.difference(s1)
 return common, s1_diff, s2_diff
>>> simple_compare(li1, li2, value_func1=f)
<<< (set([1, 3]), set([2]), set([5]))
>>> compare_lists(li1, li2, value_func1=f)
<<< (([{'v': 1}, {'v': 3}], [1, 3]), [{'v': 2}], [5])

Which depending on actual use case, might be something you could live with. It's definitely a lot faster:

>>> timeit x = simple_compare(xrange(10000), xrange(10000))
100 loops, best of 3: 2.3 ms per loop
>>> timeit x = compare_lists(xrange(10000), xrange(10000))
10 loops, best of 3: 53.1 ms per loop
answered Sep 12, 2011 at 9:06
\$\endgroup\$
4
  • \$\begingroup\$ You did not read my question, dude :) What I am trying to do is compare lists by the value after applied to some function, and return the original elements. \$\endgroup\$ Commented Sep 12, 2011 at 9:28
  • 1
    \$\begingroup\$ You can of course use my answer to modify your original function and support applying functions to values. As is, it merely demonstrates how much faster using the set methods would be. \$\endgroup\$ Commented Sep 12, 2011 at 9:42
  • 1
    \$\begingroup\$ yes, quite simply by for example using s1, s2 = set(map(func1, l1)), set(map(func2, l2)) \$\endgroup\$ Commented Sep 12, 2011 at 9:45
  • \$\begingroup\$ Yeah I suppose I'll update with an alternate approach using map, actually. Format of return is a bit different, but I think this makes more sense honestly. Might not work for you, but if you can make it work it'd be a lot simpler and faster. \$\endgroup\$ Commented Sep 12, 2011 at 10:18
1
\$\begingroup\$

If you are really trying to do sets, rather than using a list as a bag look at this code

a = [1,2,3]
b = [2,3,4]
print list (set(a) - set(b))
print list (set(b) - set(a))
print list (set(a) & set(b))
print list (set(a) | set(b))
print list (set(a) ^ set(b))

=========

Output

[1]
[4]
[2, 3]
[1, 2, 3, 4]
[1, 4]
answered Sep 12, 2011 at 10:23
\$\endgroup\$
1
\$\begingroup\$

If counts matter, then you need to use the Counter class from collections

from collections import Counter
a = [1,1,2,3,3]
b = [1,3,4,4]
print "a ",a
print "b ",b 
print "union ", list ((Counter(a) | Counter(b)).elements())
print "a not b", list ((Counter(a) - Counter(b)).elements())
print "b not a", list ((Counter(b) - Counter(a)).elements())
print "b and a", list ((Counter(b) & Counter(a)).elements())

which gives this answer

a [1, 1, 2, 3, 3]
b [1, 3, 4, 4]
union [1, 1, 2, 3, 3, 4, 4]
a not b [1, 2, 3]
b not a [4, 4]
b and a [1, 3]

Always better to use the inbuilt, rather than write your own

answered Sep 14, 2011 at 8:02
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.