This is a follow up from this question. The goal is to make some function that can return the number of possible permutations with repetitions. As Gareth pointed out there is a nice formula for this problem:
$$ \frac{m!}{n_1!,円n_2! \ldots n_k!} $$
Where \$m\$ is the length of our permutations and \$n_i\$ is the number of times the \$i\$th letter appears in \$m\$. A simple example can be shown with the word Mississippi.
$$ P(\text{Mississippi}) = \frac{11!}{4!,4円!,2円!} $$
A simple implementation of this can be done in Python as follows:
from numpy import prod
def count_unique_perms(obj):
obj = str(obj)
f = [factorial(i) for i in Counter(obj).values()]
return factorial(len(obj))/prod(f)
I include this code as a basetest for the "improved" version below. Questions were raised whether this approach works for large factorials, and it was said that the running time was at worst \$O(n^3(log n)^2)\$
I used the ideas from my previous question. To try to improve the answer. First I find the number of factors in \$m!\$ then I subtract the factors from \$n_1!,円n_2! \ldots n_k!\$. A small optimization I did was avoiding calculating duplicates.
My question if this is a decent improvement. It seems much slower until really big string. For example, it is slightly faster for str(Mississippi)*500
. Any other tips or algorithms are also welcome =)
from collections import Counter
from math import factorial, log
from numpy import prod
from primesieve import generate_primes
def count_unique_perms(obj):
obj = str(obj)
f = [factorial(i) for i in Counter(obj).values()]
return factorial(len(obj))/prod(f)
def fast_unique_perms(obj):
obj = str(obj)
factors = factors_factorial(len(obj))
obj_count = Counter([i for i in Counter(obj).values() if i > 1])
for p in obj_count:
count_p = factors_factorial(p, obj_count[p])
for prime in count_p:
factors[prime] -= count_p[prime]
num_perms = 1
for p in factors:
num_perms *= p**factors[p]
return num_perms
def factors_factorial(n, multiplicity=1):
"Calculates all the facors of n!"
count = Counter()
for p in generate_primes(n+1):
mul = 0
for k in range(1, int(1+log(n)/log(p))):
mul += int(n/p**k)
count[p] = multiplicity*mul
return count
if __name__ == '__main__':
import timeit
lst = 1123
string = 'mississippi'
print count_unique_perms(string)
print fast_unique_perms(string)
times = 100
t1 = timeit.timeit("count_unique_perms('mississippi'*1000)",
setup="from __main__ import count_unique_perms", number=times)/float(times)
t2 = timeit.timeit("fast_unique_perms('mississippi'*1000)",
setup="from __main__ import fast_unique_perms", number=times)/float(times)
print "fast_unique_perms: ", 1000*t2, "ms"
print "count_unique_perms: ", 1000*t1, "ms"
print "The count_unique_perms solution was: ", t2/float(t1), "times faster than fast_unique_perms."
1 Answer 1
1. Review
There are a bunch of small problems here — I think that if you had given yourself some more time (or looked more carefully at the answers you were given to your previous question) then you would have been able to fix many of them yourself.
Missing docstrings — you've written one for
factors_factorial
but what do the other functions do?Vague variable names — what kind of object is
obj
? What kinds of objects are counted byobj_count
? The namei
is normally used for indexes but this code uses it for characters, soc
would be better. I would expect the namep
to refer to a prime, but here it refers to a count or multiplicity, son
orm
would be better.For portability to Python 3 it's a good plan to stick to
//
for integer division and/
for floating-point division. In Python 2 you can usefrom __future__ import division
to enforce the Python 3 semantics. (Note that PEP 238 is from 2001 — fifteen years later you shouldn't be writing new code using the old semantics.)In the expression:
Counter([i for i in Counter(obj).values() if i > 1])
there's no need to create a list which you are going to throw away again immediately — you can pass a generator expression directly to the
Counter
constructor.Instead of looping over the keys of a mapping and then having a separate lookup to find the value, use the
items
method to loop over the keys and values simultaneously.Instead of adjusting the values of
factors
one at a time:count_p = factors_factorial(n, m) for prime in count_p: factors[prime] -= count_p[prime]
make use of the subtraction operation on
Counter
objects:factors -= factors_factorial(n, m)
Each time
factors_factorial
is called, it has to generate the primes again. It would be better to generate them just once.Instead of
log(n)/log(p)
, writelog(n, p)
and save a function call. See themath.log
documentation.This loop requires two calls to
log
, and computes a power on every iteration:for k in range(1, int(1+log(n)/log(p))): mul += n // p**k
It's better to write the loop like this:
k = n while k: k //= p mul += k
Writing it like this is not only simpler and avoids the power, but
k
gets smaller on each iteration, and so the divisions get cheaper as you go along.
2. Use NumPy
The code in the post imports NumPy in order to use numpy.prod
. But if you're going to use NumPy at all, then there's no reason not to use it throughout the code:
import numpy as np
def factorial_factors_powers(primes, n):
"""Return an array containing the power of each prime in the
factorization of n!.
"""
p = primes
n = np.full_like(p, n)
result = r = np.zeros_like(p)
while True:
n //= p
if n[-1] == 0: # some primes not contributing any more?
l = n.argmin() # number of primes still contributing
if l == 0:
break
n, r, p = n[:l], r[:l], p[:l]
r += n
return result
def fast_unique_perms(s):
"""Return the number of unique permutations of the string s."""
s = str(s)
primes = generate_primes(len(s) + 1)
powers = factorial_factors_powers(primes, len(s))
for n, m in Counter(Counter(s).values()).items():
if n > 1:
powers -= factorial_factors_powers(primes, n) * m
return np.prod(np.array(primes, dtype=int) ** powers)
Note the use of dtype=int
to force NumPy to do the final set of multiplications using Python's bignums instead of NumPy's fixed-width numbers.
This is around five times as fast as the code in the post.
-
\$\begingroup\$ Keep running into overflow issues when the input is over 66 ('Mississippi'*6) charachters. A suggestion was to use
dtype='int64
, however this did not solve the problem. do you have any solutions? \$\endgroup\$N3buchadnezzar– N3buchadnezzar2016年06月23日 10:55:22 +00:00Commented Jun 23, 2016 at 10:55 -
\$\begingroup\$ See updated post \$\endgroup\$Gareth Rees– Gareth Rees2016年06月23日 11:01:37 +00:00Commented Jun 23, 2016 at 11:01
Explore related questions
See similar questions with these tags.