Permutations part two

Question 1

This is a follow up from this question. The goal is to make some function that can return the number of possible permutations with repetitions. As Gareth pointed out there is a nice formula for this problem:

$$ \frac{m!}{n_1!,円n_2! \ldots n_k!} $$

Where \$m\$ is the length of our permutations and \$n_i\$ is the number of times the \$i\$th letter appears in \$m\$. A simple example can be shown with the word Mississippi.

$$ P(\text{Mississippi}) = \frac{11!}{4!,4円!,2円!} $$

A simple implementation of this can be done in Python as follows:

from numpy import prod 
def count_unique_perms(obj):
 obj = str(obj)
 f = [factorial(i) for i in Counter(obj).values()]
 return factorial(len(obj))/prod(f)

I include this code as a basetest for the "improved" version below. Questions were raised whether this approach works for large factorials, and it was said that the running time was at worst \$O(n^3(log n)^2)\$

I used the ideas from my previous question. To try to improve the answer. First I find the number of factors in \$m!\$ then I subtract the factors from \$n_1!,円n_2! \ldots n_k!\$. A small optimization I did was avoiding calculating duplicates.

My question if this is a decent improvement. It seems much slower until really big string. For example, it is slightly faster for str(Mississippi)*500. Any other tips or algorithms are also welcome =)

from collections import Counter
from math import factorial, log
from numpy import prod
from primesieve import generate_primes
def count_unique_perms(obj):
 obj = str(obj)
 f = [factorial(i) for i in Counter(obj).values()]
 return factorial(len(obj))/prod(f)
def fast_unique_perms(obj):
 obj = str(obj)
 factors = factors_factorial(len(obj))
 obj_count = Counter([i for i in Counter(obj).values() if i > 1])
 for p in obj_count:
 count_p = factors_factorial(p, obj_count[p])
 for prime in count_p:
 factors[prime] -= count_p[prime]
 num_perms = 1
 for p in factors:
 num_perms *= p**factors[p]
 return num_perms
def factors_factorial(n, multiplicity=1):
 "Calculates all the facors of n!"
 count = Counter()
 for p in generate_primes(n+1):
 mul = 0
 for k in range(1, int(1+log(n)/log(p))):
 mul += int(n/p**k)
 count[p] = multiplicity*mul
 return count
if __name__ == '__main__':
 import timeit
 lst = 1123
 string = 'mississippi'
 print count_unique_perms(string)
 print fast_unique_perms(string)
 times = 100
 t1 = timeit.timeit("count_unique_perms('mississippi'*1000)",
 setup="from __main__ import count_unique_perms", number=times)/float(times)
 t2 = timeit.timeit("fast_unique_perms('mississippi'*1000)",
 setup="from __main__ import fast_unique_perms", number=times)/float(times)
 print "fast_unique_perms: ", 1000*t2, "ms"
 print "count_unique_perms: ", 1000*t1, "ms"
 print "The count_unique_perms solution was: ", t2/float(t1), "times faster than fast_unique_perms."

Question 2

1. Review

There are a bunch of small problems here — I think that if you had given yourself some more time (or looked more carefully at the answers you were given to your previous question) then you would have been able to fix many of them yourself.

Missing docstrings — you've written one for factors_factorial but what do the other functions do?
Vague variable names — what kind of object is obj? What kinds of objects are counted by obj_count? The name i is normally used for indexes but this code uses it for characters, so c would be better. I would expect the name p to refer to a prime, but here it refers to a count or multiplicity, so n or m would be better.
For portability to Python 3 it's a good plan to stick to // for integer division and / for floating-point division. In Python 2 you can use from __future__ import division to enforce the Python 3 semantics. (Note that PEP 238 is from 2001 — fifteen years later you shouldn't be writing new code using the old semantics.)
In the expression:
```
Counter([i for i in Counter(obj).values() if i > 1])
```
there's no need to create a list which you are going to throw away again immediately — you can pass a generator expression directly to the Counter constructor.
Instead of looping over the keys of a mapping and then having a separate lookup to find the value, use the items method to loop over the keys and values simultaneously.

Instead of adjusting the values of factors one at a time:

count_p = factors_factorial(n, m)
for prime in count_p:
 factors[prime] -= count_p[prime]

make use of the subtraction operation on Counter objects:

factors -= factors_factorial(n, m)

Each time factors_factorial is called, it has to generate the primes again. It would be better to generate them just once.
Instead of log(n)/log(p), write log(n, p) and save a function call. See the math.log documentation.
This loop requires two calls to log, and computes a power on every iteration:
```
for k in range(1, int(1+log(n)/log(p))):
 mul += n // p**k
```
It's better to write the loop like this:
```
k = n
while k:
 k //= p
 mul += k
```
Writing it like this is not only simpler and avoids the power, but k gets smaller on each iteration, and so the divisions get cheaper as you go along.

2. Use NumPy

The code in the post imports NumPy in order to use numpy.prod. But if you're going to use NumPy at all, then there's no reason not to use it throughout the code:

import numpy as np
def factorial_factors_powers(primes, n):
 """Return an array containing the power of each prime in the
 factorization of n!.
 """
 p = primes
 n = np.full_like(p, n)
 result = r = np.zeros_like(p)
 while True:
 n //= p
 if n[-1] == 0: # some primes not contributing any more?
 l = n.argmin() # number of primes still contributing
 if l == 0:
 break
 n, r, p = n[:l], r[:l], p[:l]
 r += n
 return result
def fast_unique_perms(s):
 """Return the number of unique permutations of the string s."""
 s = str(s)
 primes = generate_primes(len(s) + 1)
 powers = factorial_factors_powers(primes, len(s))
 for n, m in Counter(Counter(s).values()).items():
 if n > 1:
 powers -= factorial_factors_powers(primes, n) * m
 return np.prod(np.array(primes, dtype=int) ** powers)

Note the use of dtype=int to force NumPy to do the final set of multiplications using Python's bignums instead of NumPy's fixed-width numbers.

This is around five times as fast as the code in the post.

Question 3

Keep running into overflow issues when the input is over 66 ('Mississippi'*6) charachters. A suggestion was to use dtype='int64, however this did not solve the problem. do you have any solutions?

Question 4

See updated post

Gareth Rees Gareth Rees 50.1k3 gold badges130 silver badges210 bronze badges · Accepted Answer · 2016-06-23 10:07:47Z

1. Review

There are a bunch of small problems here — I think that if you had given yourself some more time (or looked more carefully at the answers you were given to your previous question) then you would have been able to fix many of them yourself.

Missing docstrings — you've written one for factors_factorial but what do the other functions do?
Vague variable names — what kind of object is obj? What kinds of objects are counted by obj_count? The name i is normally used for indexes but this code uses it for characters, so c would be better. I would expect the name p to refer to a prime, but here it refers to a count or multiplicity, so n or m would be better.
For portability to Python 3 it's a good plan to stick to // for integer division and / for floating-point division. In Python 2 you can use from __future__ import division to enforce the Python 3 semantics. (Note that PEP 238 is from 2001 — fifteen years later you shouldn't be writing new code using the old semantics.)
In the expression:
```
Counter([i for i in Counter(obj).values() if i > 1])
```
there's no need to create a list which you are going to throw away again immediately — you can pass a generator expression directly to the Counter constructor.
Instead of looping over the keys of a mapping and then having a separate lookup to find the value, use the items method to loop over the keys and values simultaneously.

Instead of adjusting the values of factors one at a time:

count_p = factors_factorial(n, m)
for prime in count_p:
 factors[prime] -= count_p[prime]

make use of the subtraction operation on Counter objects:

factors -= factors_factorial(n, m)

Each time factors_factorial is called, it has to generate the primes again. It would be better to generate them just once.
Instead of log(n)/log(p), write log(n, p) and save a function call. See the math.log documentation.
This loop requires two calls to log, and computes a power on every iteration:
```
for k in range(1, int(1+log(n)/log(p))):
 mul += n // p**k
```
It's better to write the loop like this:
```
k = n
while k:
 k //= p
 mul += k
```
Writing it like this is not only simpler and avoids the power, but k gets smaller on each iteration, and so the divisions get cheaper as you go along.

2. Use NumPy

The code in the post imports NumPy in order to use numpy.prod. But if you're going to use NumPy at all, then there's no reason not to use it throughout the code:

import numpy as np
def factorial_factors_powers(primes, n):
 """Return an array containing the power of each prime in the
 factorization of n!.
 """
 p = primes
 n = np.full_like(p, n)
 result = r = np.zeros_like(p)
 while True:
 n //= p
 if n[-1] == 0: # some primes not contributing any more?
 l = n.argmin() # number of primes still contributing
 if l == 0:
 break
 n, r, p = n[:l], r[:l], p[:l]
 r += n
 return result
def fast_unique_perms(s):
 """Return the number of unique permutations of the string s."""
 s = str(s)
 primes = generate_primes(len(s) + 1)
 powers = factorial_factors_powers(primes, len(s))
 for n, m in Counter(Counter(s).values()).items():
 if n > 1:
 powers -= factorial_factors_powers(primes, n) * m
 return np.prod(np.array(primes, dtype=int) ** powers)

Note the use of dtype=int to force NumPy to do the final set of multiplications using Python's bignums instead of NumPy's fixed-width numbers.

This is around five times as fast as the code in the post.

Keep running into overflow issues when the input is over 66 ('Mississippi'*6) charachters. A suggestion was to use dtype='int64, however this did not solve the problem. do you have any solutions?

Stack Exchange Network

Permutations part two

1 Answer 1

1. Review

2. Use NumPy

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Permutations part two

1 Answer 1

1. Review

2. Use NumPy

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions