I'm working to understand Big O notation. As such I'm going through examples and the algorithm below generates the permutations for a given input string which contains only distinct characters. IE no duplicated or repeated characters in the input string.
My thought process was to take each character and place it at every position within any previously generated strings.
- The single character string "a" can only generate "a".
- With "ab" the first character generates "a", as above. The next character 'b' would then be added before and after it to generate "ba" and "ab".
- With "abc" this would effectively take the list generated above and repeat the same procedure placing c at position 0, 1, and 2 producing
- From "ba": "cba", "bca", "bac"
- From "ab": "cab", "acb", "abc"
- Etc...
The nested loops, foreach
and for
, is a big flag as to how inefficient my current implementation is, \$O(n * n * n) = O(n^3)\$ if I've understood the notation correctly. How can I improve the structure to remove this nesting?
public class StringPermutation
{
public List<string> Generate(string source)
{
var list = new List<string>();
foreach (var c in source) //O(source.Length)
{
if (list.Count == 0)
{
list.Add(c.ToString());
}
else
{
var beingBuilt = new List<string>();
foreach (var tempString in list) //O(list.Count)
{
for (int i = 0; i < tempString.Length + 1; i++) //O(tempString.Length)
{
var built = tempString.Insert(i, c.ToString());
beingBuilt.Add(built);
}
}
list = beingBuilt;
}
}
return list;
}
}
2 Answers 2
Before making some general remarks about the C# code, I would like to explain why the complexity of this algorithm is already optimal, even though it is not the standard approach to tackle this problem.
Complexity Analysis
As already stated in a comment, your complexity analysis is way off. The problem is your assumption that the loop over list
only multiplies the runtime by the length \$n\$ of the source. However, the list grows each iteration of the outer loop.
How complex is it really?
To determine the real complexity, we have to have a closer look at what happens inside the outer loop. By design, the list
will contain all permutations of the first \$(k-1)\$ elements when the \$k^{th}\$ iteration starts. Each element of list
has length \$(k-1)\$, accordingly, and there are \$(k-1)!\$ of them. So, in the \$k^{th}\$ iteration, we generate \$((k-1)+1)\cdot(k-1)! = k!\$ new permutations. Since we have to save a new string of length \$k\$ each time, which takes time \$k\$, up to a constant offset and multiplication with a constant, for each iteration we need \$k\cdot k!\$ character operations. This leaves us with a total runtime of \$\sum_{k=1}^nk\cdot k!\$.
It is not really straight forward that this is smaller than a multiple of \$n\cdot n!\$, i.e. that the complexity of the algorithm is \$O(n\cdot n!)\$. As a first step, we can reduce the problem a bit using the fact that \$k\leq n\$.
\$\sum_{k=1}^nk\cdot k! \leq n \cdot \sum_{k=1}^nk! = n \cdot n! \cdot \sum_{k=1}^n\frac{k!}{n!}\$
Now, let us make a convenient approximation for the items in the sum on the rhs, using that \$ \frac{1}{m} \leq \frac{1}{l} \$ for \$ m \geq l \$.
\$ \frac{k!}{n!} = \frac{1}{\prod_{m=k+1}^n m} \leq \frac{1}{\prod_{m=k+1}^n (m-k)} = \frac{1}{\prod_{l=1}^{n-k}l} = \frac{1}{(n-k)!} \$
This approximation allows us to get on the track of the argument used in the complexity analysis of the approach to generate permutations more often used in literature, namely generating all prefixes of permutations of the string of increasing length via recursion.
\$ \sum_{k=1}^n\frac{k!}{n!} \leq \sum_{k=1}^n\frac{1}{(n-k)!} = \sum_{m=0}^{n-1}\frac{1}{m!} \leq \sum_{m=0}^{\infty}\frac{1}{m!} = e\$
In the second equation, we used the the substitution \$m=n-k\$, which turns the summation order around. Moreover, for the last argument, one has to know that the exponential function is defined as function \$e^x = \sum_{n=0}^\infty \frac{x^n}{n!}\$.
The argument about convergence to a constant is similar in nature to that in the analysis of the average insert time into a dynamic array, which is often presented in CS classes. That argument uses a different series, though.
Why is this optimal?
To see that a \$O(n\cdot n!)\$ is optimal is not too hard. We know that there are \$n!\$ many permutations and that we have to generate a string of length \$n\$ for each, which each takes a number of character operations proportional to the length. So, we end up with at least \$n\cdot n!\$ character operations.
General Remarks
To be honest, I cannot find a lot to improve here regarding the coding style. One could separate the contents of the outer loop into a separate private function Permutations(List<string> permutationsOfPrefix, char nextCharacter)
, but I think it is equally as valid to keep the algorithm contained in one single method, especially since it is not very long. I also do not think that using LINQ here would improve the readability.
One point I think could be improved is there, though, the naming of variables. 'list' is a very generic name. I think permutationsOfPrefix
would be more fitting. Accordingly, tempString
could be permutation
or permutationOfPrefix
and beingBuilt
could be permutationsWithNextCharacter
. That would describe a bit better what is going on. Similarly, the i
in the for loop could be named insertionIndex
.
Another possible improvement is to explicitly check whether the input string is null
or empty at the start, to return an empty list in this case and, otherwise, to initialize list
with the first character contained. This would allow to remove the if-else-statement inside the outer loop. However, it would require to iterate over source.Drop(1)
in the outer loop.
M.Doerner has explained the time complexity thoroughly, so I won't dive into that again. Instead I have refactored your solution in a way that makes your rather unorthodox algorithm more efficient (cuts the time with more than a half - but with same time complexity). See my inline comment below.
// HH: Return an IEnumerable<string> to make it possible to yield return each permutation when ready
public IEnumerable<string> GenerateReview(string source)
{
// HH: Start with a list with an empty string in order to have a uniform loop for all elements in source
var permutations = new List<string> { "" };
int capacity = 1; // HH: The capacity (count of permutations) in the next list of permutations
// HH: Use a for loop (which is often faster than foreach) and because the index is used below
for (int j = 0; j < source.Length; j++)
{
// HH: Make the current char a string once
string currentChar = source[j].ToString();
// HH: Next permutation list is initialized with its number of permutations as capacity
var nextPermutations = new List<string>(capacity *= (j + 1));
foreach (var permutation in permutations)
{
for (int i = 0; i < permutation.Length + 1; i++)
{
var nextPermutation = permutation.Insert(i, currentChar);
nextPermutations.Add(nextPermutation);
// HH: If at the last char in source - then yield return the full permutation
if (j == source.Length - 1)
yield return nextPermutation;
}
}
permutations = nextPermutations;
}
}
It has still the same huge memory consumption for larger inputs. For instance a string like "ABCDEFGHIJK"
will build a list of 11! = 39,916,800
strings before the method finally returns.
aa
,AaA
,a a
? In other words do you need to have distinct result set? \$\endgroup\$list
in the kth iteration is (k-1)! and the insert operation generates a copy of the string with the new character inserted, which takes O(k) time. \$\endgroup\$