Several hours ago I posted this question about Generating character permutations in Python.
User rolfl said that my main problem was not using the right tool for the job, as Python is interpreted it will be much slower than a compiled solution. So I tried to write this in C.
Objective
Generate a file containing all fixed length l
permutations of a given set of chars, in this case upper-case alphabet
.
i.e.: With l = 5
the output file should look like:
AAAAA
AAAAB
AAAAC
...
ZZZZZ
My solution
I used a recursive for
loop. It's my first program in C (after a hello world) so I don't know what it's worth. It runs in about 2.621s
on my 2.9 GHz Intel Core i7
#include <stdio.h>
FILE *fp;
char alphabet[26] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'};
int len = 5;
void recloop(char *pwd, int repeat) {
int i;
for (i = 0; i < 26; i++) {
pwd[len-repeat] = alphabet[i];
if (repeat == 1) {
fprintf(fp, "%s\n", pwd);
}
else {
recloop(pwd, repeat - 1);
}
}
}
int main(void) {
char pwd[len];
fp = fopen("output", "w");
recloop(pwd, len);
}
-
\$\begingroup\$ "I thought it would belong to a new post as I'm dealing with another language." Any time that you revise code, you should make a new post. See here for more info. \$\endgroup\$Brythan– Brythan2014年10月21日 21:02:31 +00:00Commented Oct 21, 2014 at 21:02
4 Answers 4
I have spent a couple of hours struggling with a bit of a performance mystery, but, the long and short of it is that I have essentially produced a manual stack in C, and made it generic for the use-case you are investigating.
The mystery is that compiling with:
gcc -o pwgen -Wall -D_GNU_SOURCE -std=c99 pwgen.c
on my computer is 3 times faster than:
gcc -o pwgen -Wall -D_GNU_SOURCE -std=c99 -O3 pwgen.c
... and that makes no sense...
Moving on....
There are some things which are significant here.
- Don't output to a file. Output to stdout and redirect the results.
- similar to what vnp has, you create a logical stack, and 'increment' it.
- don't use recursion, the additional stack management is not necessary.
I put the code together in a way that makes it more maintainable. Note that there are no global variables, and you can pass the alphabet and generated size in to the function and it will 'just work'.
Writing maintainable code is something you will thank yourself for in the future.
Note the performance of the code:
Java: 1.204u 0.148s 0:01.14 117.5%
C: 0.392u 0.012s 0:00.40 100.0%
So, for a short run, the Java code is 3 times slower. Note that a lot of that time is probably startup overhead for Java.
When I run it with 6 characters, it runs as:
Java: 20.517u 0.524s 0:20.44 102.8%
C: 9.980u 0.092s 0:10.07 100.0%
The C version is almost exactly 26 times slower... which is good.
The Java version is half the speed of the C version for 6 char passwords.... It's in the ballpark of what I expect...
The C code I ran (5 chars) is:
#include <stdio.h>
#include <string.h>
void recloop(const int len, const char* alphabet) {
char pwd[len + 1];
int cnt[len];
const int chars = strlen(alphabet);
unsigned long long count = 1;
for (int i = 0; i < len; i++)
{
// initialize the arrays, and count the number of permutations
pwd[i] = alphabet[0];
cnt[i] = 0;
count *= chars;
}
// set the null terminator.
pwd[len] = 0;
//printf("There are %d chars with %llu permutations and %s first value\n", chars, count, pwd);
while(count--)
{
printf("%s\n", pwd);
int pos = len - 1;
while (++cnt[pos] == chars)
{
cnt[pos] = 0;
pwd[pos] = alphabet[0];
pos--;
}
pwd[pos] = alphabet[cnt[pos]];
}
}
int main(void) {
char *alpha = strdup("ABCDEFGHIJKLMNOPQRSTUVWXYZ");
recloop(5, alpha);
return 0;
}
Note that the 6-char password generated 2GB of data in 10 seconds, or 200MB/second. You will be limited by your storage system probably, not the CPU.
Update: Detail on the inner loops
Consider a small alphabet of "ABC", and a small output of only 2 characters.
The outputs are:
AA, AB, AC, BA, BB, BC, CA, CB, CC
We produce these by setting up two arrays, the actual char pwd array, and an index array:
We initialize the arrays as:
cnt -> 0,0
pwd -> A,A,0円
Then, we loop. In the outer loop We know exactly how many loops we will do because it is \3ドル^2\,ドル 9 times. That's how many passwords we generate.
The interesting part is the inner loop. On start up, cnt
is [0,0]
. We treat this like an index in to the characters, and we increment the right-most one. Let me lay out all the increments for you:
cnt pwd
--- ---
0,0 -> AA
0,1 -> AB
0,2 -> AC
1,0 -> BA
1,1 -> BB
1,2 -> BC
2,0 -> CA
2,1 -> CB
2,2 -> CC
That's the progression we want to follow. We increment the right-most digit. If it becomes 3 (the number of characters), we set it back to 0, and move 1 digit to the left.
If these were decimal numbers, it would be the same as there being 10 characters in the alphabet. You can think of the logic you would go through to roll 999 to 1000. Add 1 to 9, it becomes 0 carry 10. Add that 10 to 90, it becomes 100 carry 1000, and that's the result.
That's what the inner loop does... it starts at the right, and adds one, and carries it back if there's an overflow. The overflow is configurable though, by the number of characters in the alphabet, and the number of digits is how wide the value is. Each time it changes a value in the cnt
array, it copies the corresponding char from the alphabet back in to the pwd
.
So, if the last password was ABCZZ
we will add 1 to the last char, which 'wraps' it back to ABCZA
but we also then loop again with the next char, making it ABCAA
with another loop making it ABDAA
. The D
is not an overflow, so that's the last of the inner loops.
Note that, because count is strictly calculated, the inner loop will never have a negative overflow....
-
\$\begingroup\$ Thanks it's definitely lightning fast and learnt some good practice. I understand the whole code apart from the nested while loop. I understand what it does but I'm not sure exactly how. What's the logic behind those four lines of code if I may ask? \$\endgroup\$Juicy– Juicy2014年10月22日 01:52:50 +00:00Commented Oct 22, 2014 at 1:52
-
1\$\begingroup\$ updated with detail on inner loops \$\endgroup\$rolfl– rolfl2014年10月22日 02:18:16 +00:00Commented Oct 22, 2014 at 2:18
A few notes:
You aren't declaring
alphabet
in a very readable and maintainable form. Here's how I would do it:static const char alphabet[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
As you can see, I also made this variable
static
andconst
.static
renders variable local to the file, andconst
makes it so that we can't screw with the array somewhere along the line (and also makes things a tidbit faster). This leads me to my next point...Also, you don't have to specify the number of elements in the array when you declare it like this, since the compiler can determine that itself (because it is constant).
fp
andlen
are global variables when they shouldn't be. Arguablyalphabet
shouldn't be one either, but I think it is find as long as you have thestatic
in front of it. Anyways, they should be declared as local variables and then passed to the functions that need them.I'm not sure your algorithm is the most optimal. I used a method known as "backtracking" myself, which seems to be the most efficient that I could find in practice.
fprintf()
isn't going to be as efficient compared tofwrite()
. I would consider rewriting your code if you still want to generate a permutation table.
Time test (without compiler optimizations)
mycode.c
:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
static const char alphabet[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
static const int alphabetSize = sizeof(alphabet) - 1;
static void bruteImpl(char* str, size_t index, size_t maxDepth)
{
for (size_t i = 0; i < alphabetSize; ++i)
{
str[index] = alphabet[i];
if (index == maxDepth - 1); //printf("%s\n", str);
else bruteImpl(str, index + 1, maxDepth);
}
}
void bruteSequential(size_t maxLen)
{
char* buf = calloc(maxLen + 1, sizeof(char));
for (size_t len = 1; len <= maxLen; ++len)
{
bruteImpl(buf, 0, len);
}
free(buf);
}
int main(void)
{
bruteSequential(5);
}
$ gcc opCode.c # I took out your fprintf call so that it wouldn't be a factor $ time ./a.out real 0m0.170s user 0m0.139s sys 0m0.009s $ gcc mycode.c $ time ./a.out real 0m0.043s user 0m0.039s sys 0m0.002s
Think of the string you are generating as numbers written in 26-base. Then obtaining a next string is a matter of a simple increment:
increment_string(char * s)
{
while(*s) {
*s += 1;
if (*s > 'Z') {
*s++ = 'A';
} else {
return true;
}
}
return false;
}
Similar to @vnp's answer, this answer uses a base-N index to pull out all the permutations from a letters
string
double fac( int n )
{
if( !n ) return 1;
else return n*fac(n-1);
}
int numPerms( int n, int k )
{
return fac(n)/(fac(n-k));
}
string getPerm( int index, int k, string letters )
{
int n = letters.size();
string ans;
for( int i = 0; i < k; i++ )
{
int letterIndex = index % n;
index /= n;
ans = letters[letterIndex] + ans;
}
return ans;
}
void printPerms()
{
string letters = "abcdx";
int n = letters.size();
int k = 2;
int perms = numPerms(n,k);
for( int i = 0; i < perms; i++ )
{
puts( getPerm( i, k, letters ).c_str() );
}
}
int main()
{
printPerms();
}
-
1\$\begingroup\$ Note that your factorial computation is not the right formula for permutations, but for combinations. \$\endgroup\$rolfl– rolfl2014年10月22日 01:21:16 +00:00Commented Oct 22, 2014 at 1:21