Generating 3 combinations in Python

Question 1

I have been experimenting around with generating all the n-combinations of an array. This code quickly generates all \$\text{k-combinations}\$ of a given array. I am testing my own implementation because of some restrictions in coding competitions (also, knowing the basics never hurts).

"""
Generates 3-combinations of an array.
A B C D E F G H
^ ^ ^
i j k
A B C D E F G H
^ ^ ^
i j k
A B C D E F G H
^ ^ ^
i j k
"""
from itertools import combinations
import time
def combinations_3(ary):
 l = len(ary)
 for i in xrange(l-2):
 for j in xrange(i+1, l-1):
 for k in xrange(j+1, l):
 yield (ary[i], ary[j], ary[k])
start = time.time() 
for combination in combinations_3([1, 2, 3, 4]):
 print combination
print time.time() - start
print("===========Library Implementation============")
start = time.time()
for combination in combinations([1, 2, 3, 4], 3):
 print combination
print time.time() -start

Output

(1, 2, 3) 
(1, 2, 4) 
(1, 3, 4) 
(2, 3, 4) 
0.00043797492981 
===========Library Implementation============ 
(1, 2, 3) 
(1, 2, 4) 
(1, 3, 4) 
(2, 3, 4) 
6.19888305664e-05

It seems that my implementation is a lot slower than the library code. How can I improve performance of my code.

Question 2

Python is open source; have you considered looking at the library code to see what the differences are (it's written in C, for a start)?

Question 3

Your only hope to make it faster is to write it in C, like stdlib, or Cythonize your code.

Question 4

@Davidmh, fine, Then I think this code is fine as a quick hard coded way to write combinations.

Question 5

You probably want to use timeit module with similar setup:

run.py

from itertools import combinations
def combinations_3(ary):
 l = len(ary)
 for i in xrange(l-2):
 for j in xrange(i+1, l-1):
 for k in xrange(j+1, l):
 yield (ary[i], ary[j], ary[k])
def _test(generator):
 for combination in generator:
 #: To meger time of sequence generation we should ommit IO operations.
 pass
def test_lib(lenght=10000):
 _test(combinations(list(range(1, lenght)), 3))
def test_self_written(lenght=10000):
 _test(combinations_3(list(range(1, lenght))))

test.py

import timeit
#: Print title
print(' library time self written time')
#: Counter of sequence number.
for i in list(range(10, 100, 10)):
 #: Meger execution time for self written combinations implementation.
 self_written = timeit.Timer(
 #: Tested expression.
 'test_lib({})'.format(i),
 #: Test setup, here we just import tested function.
 'from run import test_self_written as test_lib'
 ).timeit(1000) #: And here we just set number of testing iterations.
 lib = timeit.Timer(
 'test_lib({})'.format(i),
 'from run import test_lib'
 ).timeit(1000)
 #: Print output in format "<number of elements>: library time, self written time"
 print('{:03} elements: {:10.6f} {:10.6f}'.format(i, lib, self_written))

And here is my output:

$ python2 test.py 
 library time self written time
010 elements: 0.007060 0.030495
020 elements: 0.058317 0.239376
030 elements: 0.211884 0.817189
040 elements: 0.522477 1.903142
050 elements: 1.047018 3.803112
060 elements: 1.969069 6.590189
070 elements: 2.960615 10.500072
080 elements: 4.468489 15.176092
090 elements: 6.402755 21.669669
$ python3 test.py 
 library time self written time
010 elements: 0.009168 0.038298
020 elements: 0.058049 0.285885
030 elements: 0.208938 1.031432
040 elements: 0.523407 2.450311
050 elements: 1.048463 4.796328
060 elements: 1.828277 7.832434
070 elements: 2.970929 13.067557
080 elements: 4.481792 18.496334
090 elements: 6.353836 26.277946
$ pypy test.py 
 library time self written time
010 elements: 0.011924 0.040272
020 elements: 0.056972 0.046638
030 elements: 0.185858 0.083793
040 elements: 0.456202 0.188640
050 elements: 0.913935 0.338211
060 elements: 1.588666 0.542124
070 elements: 2.537772 0.846942
080 elements: 3.816353 1.232982
090 elements: 5.471375 1.707760

And python versions

$ python2 -V && python3 -V && pypy -V
Python 2.7.6
Python 3.4.3
[PyPy 2.2.1 with GCC 4.8.2]

As you can see, pypy is much faster than cpython implementation.

Actually, if you are using cpython you do not really want to reinvent standard library functions because they are pretty nice optimized by c compiler and you code will be interpreted instead of compiling. In case of pypy you probably want to make some research, because JIT interpreter have a lot of different corner cases.

Question 6

You do not really want to reinvent standard library functions because they are pretty nice optimized by c compiler -> Well said

Question 7

@JaDogg, reinventing and understanding is quite different things I think. In that manner you shouldn't write any sorting algorithm at all, right? because they are already implemented out there.

Question 8

In fact these approach are quite useful when your coding environment doesn't allow external libraries or your use case is quite restricted.

Question 9

I'd like to join discussion with title "std:: or not std::" but we are talking about python, not c++.

Question 10

I never wanted to emphasize on it. My bad if that point got highlighted. I wanted if I can improve performance any further.

Question 11

I was able to shave off 20 % of time in outoftime's test by avoiding the indexed lookups in (ary[i], ary[j], ary[k]) and doing this instead:

def combinations_3(ary):
 enumerated = list(enumerate(ary))
 for i, x in enumerated[:-2]:
 for j, y in enumerated[i+1:-1]:
 for z in ary[j+1:]:
 yield (x, y, z)

Precomputing the slice of the innermost loop for each j takes another 10 % off:

def combinations_3(ary):
 enumerated = [(i, value, ary[i+1:]) for i, value in enumerate(ary)]
 for i, x, _ in enumerated[:-2]:
 for _, y, tail in enumerated[i+1:-1]:
 for z in tail:
 yield (x, y, z)

It is worth noting that the first version requires O(n) additional space for the list slices, and the second version ups the requirement to O(n²), where n is the size of ary.

Question 12

Do these optimizations affects the Big-O complexity of the algorithm?

Question 13

@CodeYogi Space complexity increases from O(1) to O(n) and to O(n^2) in the second version. Time complexity remains O(n^3).

outoftime outoftime 1,76910 silver badges18 bronze badges · Accepted Answer · 2015-10-13 07:43:50Z

You probably want to use timeit module with similar setup:

run.py

from itertools import combinations
def combinations_3(ary):
 l = len(ary)
 for i in xrange(l-2):
 for j in xrange(i+1, l-1):
 for k in xrange(j+1, l):
 yield (ary[i], ary[j], ary[k])
def _test(generator):
 for combination in generator:
 #: To meger time of sequence generation we should ommit IO operations.
 pass
def test_lib(lenght=10000):
 _test(combinations(list(range(1, lenght)), 3))
def test_self_written(lenght=10000):
 _test(combinations_3(list(range(1, lenght))))

test.py

import timeit
#: Print title
print(' library time self written time')
#: Counter of sequence number.
for i in list(range(10, 100, 10)):
 #: Meger execution time for self written combinations implementation.
 self_written = timeit.Timer(
 #: Tested expression.
 'test_lib({})'.format(i),
 #: Test setup, here we just import tested function.
 'from run import test_self_written as test_lib'
 ).timeit(1000) #: And here we just set number of testing iterations.
 lib = timeit.Timer(
 'test_lib({})'.format(i),
 'from run import test_lib'
 ).timeit(1000)
 #: Print output in format "<number of elements>: library time, self written time"
 print('{:03} elements: {:10.6f} {:10.6f}'.format(i, lib, self_written))

And here is my output:

$ python2 test.py 
 library time self written time
010 elements: 0.007060 0.030495
020 elements: 0.058317 0.239376
030 elements: 0.211884 0.817189
040 elements: 0.522477 1.903142
050 elements: 1.047018 3.803112
060 elements: 1.969069 6.590189
070 elements: 2.960615 10.500072
080 elements: 4.468489 15.176092
090 elements: 6.402755 21.669669
$ python3 test.py 
 library time self written time
010 elements: 0.009168 0.038298
020 elements: 0.058049 0.285885
030 elements: 0.208938 1.031432
040 elements: 0.523407 2.450311
050 elements: 1.048463 4.796328
060 elements: 1.828277 7.832434
070 elements: 2.970929 13.067557
080 elements: 4.481792 18.496334
090 elements: 6.353836 26.277946
$ pypy test.py 
 library time self written time
010 elements: 0.011924 0.040272
020 elements: 0.056972 0.046638
030 elements: 0.185858 0.083793
040 elements: 0.456202 0.188640
050 elements: 0.913935 0.338211
060 elements: 1.588666 0.542124
070 elements: 2.537772 0.846942
080 elements: 3.816353 1.232982
090 elements: 5.471375 1.707760

And python versions

$ python2 -V && python3 -V && pypy -V
Python 2.7.6
Python 3.4.3
[PyPy 2.2.1 with GCC 4.8.2]

As you can see, pypy is much faster than cpython implementation.

Actually, if you are using cpython you do not really want to reinvent standard library functions because they are pretty nice optimized by c compiler and you code will be interpreted instead of compiling. In case of pypy you probably want to make some research, because JIT interpreter have a lot of different corner cases.

You do not really want to reinvent standard library functions because they are pretty nice optimized by c compiler -> Well said
@JaDogg, reinventing and understanding is quite different things I think. In that manner you shouldn't write any sorting algorithm at all, right? because they are already implemented out there.
In fact these approach are quite useful when your coding environment doesn't allow external libraries or your use case is quite restricted.
I'd like to join discussion with title "std:: or not std::" but we are talking about python, not c++.
I never wanted to emphasize on it. My bad if that point got highlighted. I wanted if I can improve performance any further.

Stack Exchange Network

Generating 3 combinations in Python

Output

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Generating 3 combinations in Python

Output

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions