Efficiently calculating unique permutations in a set

Question 1

I'm currently calculating the unique permutations of an array of data. While the following code is working, it's not as efficient as I would like. Once I get over 6 or 8 items, it becomes very slow and I start running into memory issues.

Here is the code and an explanation

<?php
function permuteUnique($items, $count = false, $perms = [], &$return = []) {
 if ($count && count($return) == $count) return $return;
 if (empty($items)) {
 $duplicate = false;
 foreach ($return as $a) {
 if ($a === $perms) {
 $duplicate = true;
 break;
 }
 }
 if (!$duplicate) $return[] = $perms;
 } else {
 for ($i = count($items) - 1; $i >= 0; --$i) {
 $newitems = $items;
 $newperms = $perms;
 list($tmp) = array_splice($newitems, $i, 1);
 array_unshift($newperms, $tmp);
 permuteUnique($newitems, $count, $newperms, $return);
 }
 return $return;
 }
}
function factorial($n) {
 $f = 1;
 for ($i = 2; $i <= $n; $i++) $f *= $i;
 return $f;
}

Given the input [1, 1, 2] I receive the following output as expected

array (size=3)
 0 => 
 array (size=3)
 0 => int 1
 1 => int 1
 2 => int 2
 1 => 
 array (size=3)
 0 => int 1
 1 => int 2
 2 => int 1
 2 => 
 array (size=3)
 0 => int 2
 1 => int 1
 2 => int 1

The $count parameter is so I can pass the number of unique permutations I expect to the function and once it has found that many, it can stop calculating and return the data. This is calculated as the factorial of the total number of items divided by the product of the factorial of the count of all duplicates. I'm not sure I said that right so let me show you an example.

Given the set [1, 2, 2, 3, 4, 4, 4, 4] the count of unique permutations is calculated as 8! / (2!4!) = 840 because there are 8 total items, one of them is duplicated twice, and another is duplicated 4 times.

Now if I translate that to php code...

<?php
$set = [1, 2, 2, 3, 4, 4, 4, 4];
$divisor = 1;
foreach (array_count_values($set) as $v) {
 $divisor *= factorial($v);
}
$count = factorial(count($set)) / $divisor;
$permutations = permuteUnique($set, $count);

it's pretty slow. If I throw a counter into the permuteUnique function, it runs over 100k times before it finds the 840 unique permutations.

I would like to find a way to reduce this and find the shortest possible path to the unique permutations. I appreciate any help or advice you can give.

Question 2

Look at std::next_permutation for C++ and find or implement something like this for PHP.

Question 3

So I spent some more time thinking about this and here is what I came up with.

<?php
function permuteUnique($items, $perms = [], &$return = []) {
 if (empty($items)) {
 $return[] = $perms;
 } else {
 sort($items);
 $prev = false;
 for ($i = count($items) - 1; $i >= 0; --$i) {
 $newitems = $items;
 $tmp = array_splice($newitems, $i, 1)[0];
 if ($tmp != $prev) {
 $prev = $tmp;
 $newperms = $perms;
 array_unshift($newperms, $tmp);
 permuteUnique($newitems, $newperms, $return);
 }
 }
 return $return;
 }
}
$permutations = permuteUnique([1, 2, 2, 3, 4, 4, 4, 4]);

Previous stats

Uniques: 840
Calls to permuteUnique: 107,591
Duplicates found: 38737
Execution time (seconds): 4.898668050766

New stats

Uniques: 840
Calls to permuteUnique: 2647
Duplicates found: 0
Execution time (seconds): 0.0095300674438477

So all I really did was sort the data set, keep track of the previous item, and not calculate permutations if the current item matched the previous. I also no longer have to pre calculate the amount of uniques and iterate through the permutations to check for duplicates. That made a world of difference.

Question 4

In this line if ($tmp != $prev) you should use !== insetd of !=. For loose comparison it breaks when there is 0 in set, e.g. $permutations = permuteUnique([0, 1, 1]);

Question 5

What did you guys use for profiling and getting these stats

Question 6

I just tried the "Generation in lexicographic order" way on the wiki, and it generates the same result for your "1,2,2,3,4,4,4,4" sample, so I guess it is correct. Here is the code:

function &permuteUnique($items) {
 sort($items);
 $size = count($items);
 $return = [];
 while (true) {
 $return[] = $items;
 $invAt = $size - 2;
 for (;;$invAt--) {
 if ($invAt < 0) {
 break 2;
 }
 if ($items[$invAt] < $items[$invAt + 1]) {
 break;
 }
 }
 $swap1Num = $items[$invAt];
 $inv2At = $size - 1;
 while ($swap1Num >= $items[$inv2At]) {
 $inv2At--;
 }
 $items[$invAt] = $items[$inv2At];
 $items[$inv2At] = $swap1Num;
 $reverse1 = $invAt + 1;
 $reverse2 = $size - 1;
 while ($reverse1 < $reverse2) {
 $temp = $items[$reverse1];
 $items[$reverse1] = $items[$reverse2];
 $items[$reverse2] = $temp;
 $reverse1++;
 $reverse2--;
 }
 }
 return $return;
}

Profiling the time for your example input: the above method: 2600,3000,3000,2400,2400,3000; your "Calls to permuteUnique: 2647" method: 453425.6,454425.4,454625.8. In your this example input, it is around 500 times faster :) If you are processing the result one by one (I guess you will), using this non-recursive method, you can process one generated and then generate the next (instead of generating all and store all before processing).

Question 7

I'm not sure where you got 500 times faster. It's approximately 3-5 times faster from my tests, even with a larger set. Still a very good answer. Care to provide a link to the wiki you refer to?

Question 8

@Rob: Sure. It's en.wikipedia.org/wiki/… And I've found a way to say it is correct (Previously I just guess). The 500 times thing is from the profiling.

Question 9

Try this modificated iterative version. It does not have the recursive overhead.

Found on: http://docstore.mik.ua/orelly/webprog/pcook/ch04_26.htm

ORIGINAL:

function pc_next_permutation($p, $size) {
 // slide down the array looking for where we're smaller than the next guy
 for ($i = $size - 1; $p[$i] >= $p[$i+1]; --$i) { }
 // if this doesn't occur, we've finished our permutations
 // the array is reversed: (1, 2, 3, 4) => (4, 3, 2, 1)
 if ($i == -1) { return false; }
 // slide down the array looking for a bigger number than what we found before
 for ($j = $size; $p[$j] <= $p[$i]; --$j) { }
 // swap them
 $tmp = $p[$i]; $p[$i] = $p[$j]; $p[$j] = $tmp;
 // now reverse the elements in between by swapping the ends
 for (++$i, $j = $size; $i < $j; ++$i, --$j) {
 $tmp = $p[$i]; $p[$i] = $p[$j]; $p[$j] = $tmp;
 }
 return $p;
}
$set = split(' ', 'she sells seashells'); // like array('she', 'sells', 'seashells')
$size = count($set) - 1;
$perm = range(0, $size);
$j = 0;
do { 
 foreach ($perm as $i) { $perms[$j][] = $set[$i]; }
} while ($perm = pc_next_permutation($perm, $size) and ++$j);
foreach ($perms as $p) {
 print join(' ', $p) . "\n";
}

Here is one idea to modify it to distinct permutations, but I think there are faster solutions....

function pc_next_permutation($p, $size) {
 for ($i = $size - 1; $p[$i] >= $p[$i+1]; --$i) { }
 if ($i == -1) { return false; }
 for ($j = $size; $p[$j] <= $p[$i]; --$j) { }
 $tmp = $p[$i]; $p[$i] = $p[$j]; $p[$j] = $tmp;
 for (++$i, $j = $size; $i < $j; ++$i, --$j) {
 $tmp = $p[$i]; $p[$i] = $p[$j]; $p[$j] = $tmp;
 }
 return $p;
}
$uniqueMap=array();
$set = split(' ', '1 2 2 3 4 4 4 4');
$size = count($set) - 1;
$perm = range(0, $size);
$j=0;
do { 
 $uniqueSetString="";
 foreach ($perm as $i) 
 $uniqueSetString .= "|".$set[$i];
 if (!isset($uniqueMap[$uniqueSetString]))
 {
 foreach ($perm as $i)
 $perms[$j][] = $set[$i];
 $uniqueMap[$uniqueSetString]=1;
 }
} while ($perm = pc_next_permutation($perm, $size) and ++$j);
foreach ($perms as $p) {
 print join(' ', $p) . "\n";
}

Question 10

Undefined offset: -1 on line 3? :o

Question 11

What will be the fix? I cannot understand this 6 year old code anymore ^^

Question 12

What you need is the factoriadic, it allows you to generate the nth permutation without having to all the previous / following ones. I've coded it in PHP but I don't have it with me ATM, sorry.

EDIT: Here you go, it should get you started.

Rob 12.9k4 gold badges44 silver badges57 bronze badges · Accepted Answer · 2013-09-25 03:49:33Z

So I spent some more time thinking about this and here is what I came up with.

<?php
function permuteUnique($items, $perms = [], &$return = []) {
 if (empty($items)) {
 $return[] = $perms;
 } else {
 sort($items);
 $prev = false;
 for ($i = count($items) - 1; $i >= 0; --$i) {
 $newitems = $items;
 $tmp = array_splice($newitems, $i, 1)[0];
 if ($tmp != $prev) {
 $prev = $tmp;
 $newperms = $perms;
 array_unshift($newperms, $tmp);
 permuteUnique($newitems, $newperms, $return);
 }
 }
 return $return;
 }
}
$permutations = permuteUnique([1, 2, 2, 3, 4, 4, 4, 4]);

Previous stats

Uniques: 840
Calls to permuteUnique: 107,591
Duplicates found: 38737
Execution time (seconds): 4.898668050766

New stats

Uniques: 840
Calls to permuteUnique: 2647
Duplicates found: 0
Execution time (seconds): 0.0095300674438477

So all I really did was sort the data set, keep track of the previous item, and not calculate permutations if the current item matched the previous. I also no longer have to pre calculate the amount of uniques and iterate through the permutations to check for duplicates. That made a world of difference.

In this line if ($tmp != $prev) you should use !== insetd of !=. For loose comparison it breaks when there is 0 in set, e.g. $permutations = permuteUnique([0, 1, 1]);

CollectivesTM on Stack Overflow

Efficiently calculating unique permutations in a set

4 Answers 4

2 Comments

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

4 Answers 4

2 Comments

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related