I have made the following function that gives the combinations permuted of n elements with a constrain/rule. This consist on creating a dataframe with all combination of numbers from 0 to n that sums n.
Function
import pandas as pd
import itertools
import numpy as np
def combinations_permuted(n_elm):
items = list(range(0,n_elm+1,1))
combinations = pd.DataFrame(list(filter(lambda x: np.sum(x)==n_elm,list(itertools.combinations_with_replacement(items, n_elm)))))
comb_permuted = pd.DataFrame()
for index, combination in combinations.iterrows():
comb_permuted=comb_permuted.append(list(filter(lambda x: np.sum(x)==n_elm,list(itertools.permutations(combination.tolist(), n_elm)))))
return(comb_permuted.drop_duplicates().as_matrix())
Example
array([[0, 0, 3],
[0, 3, 0],
[3, 0, 0],
[0, 1, 2],
[0, 2, 1],
[1, 0, 2],
[1, 2, 0],
[2, 0, 1],
[2, 1, 0],
[1, 1, 1]], dtype=int64)
The problem is that is taking a long time when n_elm
is "big" for example 9. I think this code can be improving in terms of time performance.
Maybe by replacing the for
loop with a map
function.
Any help to get it?
-
\$\begingroup\$ What Python version are you targeting? \$\endgroup\$Mast– Mast ♦2018年03月21日 13:41:56 +00:00Commented Mar 21, 2018 at 13:41
-
\$\begingroup\$ try with stackoverflow.com/a/28969798/3308055 \$\endgroup\$juvian– juvian2018年03月21日 14:32:13 +00:00Commented Mar 21, 2018 at 14:32
1 Answer 1
def combinations_permuted(n_elm):
IMO n
(as used in the description of the code in the question) is more readable than n_elm
.
items = list(range(0,n_elm+1,1))
1
is the default step, and range
isn't so obscure a function that the maintenance programmer would wonder what range(0, n+1)
does.
combinations = pd.DataFrame(list(filter(lambda x: np.sum(x)==n_elm,list(itertools.combinations_with_replacement(items, n_elm))))) comb_permuted = pd.DataFrame() for index, combination in combinations.iterrows(): comb_permuted=comb_permuted.append(list(filter(lambda x: np.sum(x)==n_elm,list(itertools.permutations(combination.tolist(), n_elm))))) return(comb_permuted.drop_duplicates().as_matrix())
There are two things here which sound warning bells:
comb_permuted=comb_permuted.append
. The documentation for DataFrame.append saysIteratively appending rows to a DataFrame can be more computationally intensive than a single concatenate. A better solution is to append those rows to a list and then concatenate the list with the original DataFrame all at once.
drop_duplicates()
. That says that the generation process is doing more work that it needs to.
A KISS approach would be to replace the combinations_with_replacement
, permutations
, drop_duplicates
chain with itertools.product
. This is much faster at n = 3
, but already slower at n = 5
(because it's still doing more work that it needs to, and filtering).
The efficient approach is to do only the work that's necessary. A quick and dirty implementation (i.e. please tidy it up before using) calculates for n = 6
in less time than it takes the original code to calculate n = 3
:
def combinations_recursive_inner(n, buf, gaps, sum, accum):
if gaps == 0:
accum.append(list(buf))
else:
for x in range(0, n+1):
if sum + x + (gaps - 1) * n < n: continue
if sum + x > n: break
combinations_recursive_inner(n, buf + [x], gaps - 1, sum + x, accum)
def combinations_recursive(n):
accum = []
combinations_recursive_inner(n, [], n, 0, accum)
return pd.DataFrame(accum).as_matrix()
Explore related questions
See similar questions with these tags.