Method to generate a List<string> containing permutation of an input string

Question 1

I'm working to understand Big O notation. As such I'm going through examples and the algorithm below generates the permutations for a given input string which contains only distinct characters. IE no duplicated or repeated characters in the input string.

My thought process was to take each character and place it at every position within any previously generated strings.

The single character string "a" can only generate "a".
With "ab" the first character generates "a", as above. The next character 'b' would then be added before and after it to generate "ba" and "ab".
With "abc" this would effectively take the list generated above and repeat the same procedure placing c at position 0, 1, and 2 producing
- From "ba": "cba", "bca", "bac"
- From "ab": "cab", "acb", "abc"
Etc...

The nested loops, foreach and for, is a big flag as to how inefficient my current implementation is, \$O(n * n * n) = O(n^3)\$ if I've understood the notation correctly. How can I improve the structure to remove this nesting?

public class StringPermutation
{
 public List<string> Generate(string source)
 {
 var list = new List<string>();
 foreach (var c in source) //O(source.Length)
 {
 if (list.Count == 0)
 {
 list.Add(c.ToString());
 }
 else
 {
 var beingBuilt = new List<string>();
 foreach (var tempString in list) //O(list.Count)
 {
 for (int i = 0; i < tempString.Length + 1; i++) //O(tempString.Length)
 {
 var built = tempString.Insert(i, c.ToString());
 beingBuilt.Add(built);
 }
 }
 list = beingBuilt;
 }
 }
 return list;
 }
}

Question 2

How should your solution handle the following inputs: aa, AaA, a a? In other words do you need to have distinct result set?

Question 3

I have only tested with lowercase distinct input of "a", "ab", "abc", "abcde", etc appending with the next letter. I’m staying with the simpler input string for now to get more experience.

Question 4

@PeterCsala "But as far as I know here we should not suggest alternative solutions" you are allowed to do this you just have to explain how it's better than the OP's code. Which is the only requirement when posting an answer, to teach (not give) the OP something better. For example contrast "Use this <code dump>" and "You can improve the above by using <algorithm name> as it reduced Big O from \$O(n^3)\$ to \$O(n^2)\$."

Question 5

Note that the complexity of this algorithm is not O(n^3) but O(n*n!). The length of list in the kth iteration is (k-1)! and the insert operation generates a copy of the string with the new character inserted, which takes O(k) time.

Question 6

Sorry, in my previous comment, I have missed the innermost loop, which adds another factor of k to the execution time of the kth iteration of the main loop, probably making the algorithm take O(n^2*n!) only.

Question 7

Before making some general remarks about the C# code, I would like to explain why the complexity of this algorithm is already optimal, even though it is not the standard approach to tackle this problem.

Complexity Analysis

As already stated in a comment, your complexity analysis is way off. The problem is your assumption that the loop over list only multiplies the runtime by the length \$n\$ of the source. However, the list grows each iteration of the outer loop.

How complex is it really?

To determine the real complexity, we have to have a closer look at what happens inside the outer loop. By design, the list will contain all permutations of the first \$(k-1)\$ elements when the \$k^{th}\$ iteration starts. Each element of list has length \$(k-1)\$, accordingly, and there are \$(k-1)!\$ of them. So, in the \$k^{th}\$ iteration, we generate \$((k-1)+1)\cdot(k-1)! = k!\$ new permutations. Since we have to save a new string of length \$k\$ each time, which takes time \$k\$, up to a constant offset and multiplication with a constant, for each iteration we need \$k\cdot k!\$ character operations. This leaves us with a total runtime of \$\sum_{k=1}^nk\cdot k!\$.

It is not really straight forward that this is smaller than a multiple of \$n\cdot n!\$, i.e. that the complexity of the algorithm is \$O(n\cdot n!)\$. As a first step, we can reduce the problem a bit using the fact that \$k\leq n\$.

\$\sum_{k=1}^nk\cdot k! \leq n \cdot \sum_{k=1}^nk! = n \cdot n! \cdot \sum_{k=1}^n\frac{k!}{n!}\$

Now, let us make a convenient approximation for the items in the sum on the rhs, using that \$ \frac{1}{m} \leq \frac{1}{l} \$ for \$ m \geq l \$.

\$ \frac{k!}{n!} = \frac{1}{\prod_{m=k+1}^n m} \leq \frac{1}{\prod_{m=k+1}^n (m-k)} = \frac{1}{\prod_{l=1}^{n-k}l} = \frac{1}{(n-k)!} \$

This approximation allows us to get on the track of the argument used in the complexity analysis of the approach to generate permutations more often used in literature, namely generating all prefixes of permutations of the string of increasing length via recursion.

\$ \sum_{k=1}^n\frac{k!}{n!} \leq \sum_{k=1}^n\frac{1}{(n-k)!} = \sum_{m=0}^{n-1}\frac{1}{m!} \leq \sum_{m=0}^{\infty}\frac{1}{m!} = e\$

In the second equation, we used the the substitution \$m=n-k\$, which turns the summation order around. Moreover, for the last argument, one has to know that the exponential function is defined as function \$e^x = \sum_{n=0}^\infty \frac{x^n}{n!}\$.

The argument about convergence to a constant is similar in nature to that in the analysis of the average insert time into a dynamic array, which is often presented in CS classes. That argument uses a different series, though.

Why is this optimal?

To see that a \$O(n\cdot n!)\$ is optimal is not too hard. We know that there are \$n!\$ many permutations and that we have to generate a string of length \$n\$ for each, which each takes a number of character operations proportional to the length. So, we end up with at least \$n\cdot n!\$ character operations.

General Remarks

To be honest, I cannot find a lot to improve here regarding the coding style. One could separate the contents of the outer loop into a separate private function Permutations(List<string> permutationsOfPrefix, char nextCharacter), but I think it is equally as valid to keep the algorithm contained in one single method, especially since it is not very long. I also do not think that using LINQ here would improve the readability.

One point I think could be improved is there, though, the naming of variables. 'list' is a very generic name. I think permutationsOfPrefix would be more fitting. Accordingly, tempString could be permutation or permutationOfPrefix and beingBuilt could be permutationsWithNextCharacter. That would describe a bit better what is going on. Similarly, the i in the for loop could be named insertionIndex.

Another possible improvement is to explicitly check whether the input string is null or empty at the start, to return an empty list in this case and, otherwise, to initialize list with the first character contained. This would allow to remove the if-else-statement inside the outer loop. However, it would require to iterate over source.Drop(1) in the outer loop.

Question 8

M.Doerner has explained the time complexity thoroughly, so I won't dive into that again. Instead I have refactored your solution in a way that makes your rather unorthodox algorithm more efficient (cuts the time with more than a half - but with same time complexity). See my inline comment below.

// HH: Return an IEnumerable<string> to make it possible to yield return each permutation when ready
public IEnumerable<string> GenerateReview(string source)
{
 // HH: Start with a list with an empty string in order to have a uniform loop for all elements in source
 var permutations = new List<string> { "" };
 int capacity = 1; // HH: The capacity (count of permutations) in the next list of permutations
 // HH: Use a for loop (which is often faster than foreach) and because the index is used below
 for (int j = 0; j < source.Length; j++)
 {
 // HH: Make the current char a string once
 string currentChar = source[j].ToString();
 // HH: Next permutation list is initialized with its number of permutations as capacity
 var nextPermutations = new List<string>(capacity *= (j + 1));
 foreach (var permutation in permutations)
 {
 for (int i = 0; i < permutation.Length + 1; i++)
 {
 var nextPermutation = permutation.Insert(i, currentChar);
 nextPermutations.Add(nextPermutation);
 // HH: If at the last char in source - then yield return the full permutation
 if (j == source.Length - 1)
 yield return nextPermutation;
 }
 }
 permutations = nextPermutations;
 }
}

It has still the same huge memory consumption for larger inputs. For instance a string like "ABCDEFGHIJK" will build a list of 11! = 39,916,800 strings before the method finally returns.

M.Doerner M.DoernerM.Doerner 1,53010 silver badges12 bronze badges · Answer 1 · 2020-06-05 19:55:54Z