Enigma simulator performance

Question 1

Here is my implementation of a simple 3 rotor Enigma machine in C++:

#include <iostream>
#include <cstring>
using namespace std;
char alpha[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
char rotors[3][27] = 
{
 "EKMFLGDQVZNTOWYHXUSPAIBRCJ",
 "AJDKSIRUXBLHWTMCQGZNPYFVOE",
 "BDFHJLCPRTXVZNYEIWGAKMUSQO"
};
char reflector[] = "YRUHQSLDPXNGOKMIEBFZCWVJAT";
char key[] = "ABC";
long mod26(long a)
{
 return (a%26+26)%26;
}
int li (char l)
{
 // Letter index
 return l - 'A';
}
int indexof (char* array, int find)
{
 return strchr(array, find) - array;
}
string crypt (const char *ct)
{
 // Sets initial permutation
 int L = li(key[0]);
 int M = li(key[1]);
 int R = li(key[2]);
 string output;
 for ( int x = 0; x < strlen(ct) ; x++ ) {
 int ct_letter = li(ct[x]);
 // Step right rotor on every iteration
 R = mod26(R + 1);
 // Pass through rotors
 char a = rotors[2][mod26(R + ct_letter)];
 char b = rotors[1][mod26(M + li(a) - R)];
 char c = rotors[0][mod26(L + li(b) - M)];
 // Pass through reflector
 char ref = reflector[mod26(li(c) - L)];
 // Inverse rotor pass
 int d = mod26(indexof(rotors[0], alpha[mod26(li(ref) + L)]) - L);
 int e = mod26(indexof(rotors[1], alpha[mod26(d + M)]) - M);
 char f = alpha[mod26(indexof(rotors[2], alpha[mod26(e + R)]) - R)];
 output += f;
 }
 return output;
}
int main ()
{
 for ( int i = 0; i < 1000000; i++) {
 crypt ("PZUFWDSASJGQGNRMAEODZJXQQKHSYGVUSGSU");
 }
 return 0;
}

Benchmark ran on a 2.66ghz Core 2 Duo for 1 million decrypts:

C++ Version:

11.64s user 0.02s system 97% cpu 11.963 total

I'm not a C++ programmer, I first wrote this in Javascript and then translated it into C++.

Whilst 1 million in ~10 seconds is OK, I feel that my implementation isn't optimal. Are there any optimisations, or micro optimisations that you could recommend to my C++ code to make it faster? Maybe I should try a different language? Any suggestions are welcome.

Question 2

Which is faster: int, long, boolean, or double make your constants constant - saves time

Question 3

Let's profile this a little bit (this is on a dual-core i7, so a bit quicker):

Running Time Self Symbol Name

3984.0ms 100.0% 0.0 Main Thread 0x2856c

3982.0ms 99.9% 0.0 start

3982.0ms 99.9% 1.0 main

3936.0ms 98.7% 3081.0 crypt(char const*)

640.0ms 16.0% 640.0 _platform_strchr

126.0ms 3.1% 102.0 std::__1::basic_string, std::__1::allocator>::push_back(char)

89.0ms 2.2% 89.0 strlen

14.0ms 0.3% 14.0 DYLD-STUB$$strchr

14.0ms 0.3% 4.0 free

8.0ms 0.2% 1.0 szone_free_definite_size

7.0ms 0.1% 7.0 DYLD-STUB$$strlen

2.0ms 0.0% 2.0 operator delete(void*)

2.0ms 0.0% 0.0 _dyld_start

Ok, so more than 98% of the time is spent in the crypt function (which isn't surprising really). Almost 16% of the time is spent in strchr, with another 3.1% spent in push_back. There's also a few percent (2.2) spent doing strlen.

Ok, so how do we cut down on these:

Firstly, we'll switch over to using std::array<char> instead of char *. These are both stack-allocated (and so we don't incur the dynamic allocation overhead of std::string or std::vector), but std::array knows its size, and will allow us to remove the strlen in the crypt function:

#include <array>
const std::array<char, 27> alpha = {"ABCDEFGHIJKLMNOPQRSTUVWXYZ"};
const std::array<std::array<char, 27>, 3> rotors
{
 {{"EKMFLGDQVZNTOWYHXUSPAIBRCJ"},
 {"AJDKSIRUXBLHWTMCQGZNPYFVOE"},
 {"BDFHJLCPRTXVZNYEIWGAKMUSQO"}}
};
const std::array<char, 27> reflectors = {"YRUHQSLDPXNGOKMIEBFZCWVJAT"};
const std::array<char, 4> key = {"ABC"};

The indexof function then needs to change as well (and I'm going to rename it to index_of, which is easier to read):

template <size_t N>
std::size_t index_of (const std::array<char, N>& str, int find)
{
 for(std::size_t i = 0; i < N; ++i) {
 if(str[i] == find) return i;
 }
 return -1;
}

crypt now looks like:

template <size_t N>
std::array<char, N> crypt (const std::array<char, N>& ct)
{
 // Sets initial permutation
 int L = li(key[0]);
 int M = li(key[1]);
 int R = li(key[2]);
 std::array<char, N> output;
 for ( unsigned x = 0; x < N ; x++ ) {
 int ct_letter = li(ct[x]);
 // Step right rotor on every iteration
 R = static_cast<int>(mod26(R + 1));
 // Pass through rotors
 char a = rotors[2][mod26(R + ct_letter)];
 char b = rotors[1][mod26(M + li(a) - R)];
 char c = rotors[0][mod26(L + li(b) - M)];
 // Pass through reflector
 char ref = reflectors[mod26(li(c) - L)];
 // Inverse rotor pass
 long d = mod26(index_of(rotors[0], alpha[mod26(li(ref) + L)]) - L);
 long e = mod26(index_of(rotors[1], alpha[mod26(d + M)]) - M);
 char f = mod26(index_of(rotors[2], alpha[mod26(e + R)]) - R);
 output[x] = alpha[f];
 }
 return output;
}

which is close to what it was, but with a few minor changes (no += on a string, as we know already how large it will be and simple assign directly into that index).

And we now call it like so:

int main ()
{
 for ( int i = 0; i < 1000000; i++) {
 crypt(std::array<char, 37>{"PZUFWDSASJGQGNRMAEODZJXQQKHSYGVUSGSU"});
 }
 return 0;
}

Let's profile this again:

Running Time Self Symbol Name

2873.0ms 100.0% 0.0 Main Thread 0x2c7ff

2872.0ms 99.9% 0.0 start

2872.0ms 99.9% 2.0 main

2870.0ms 99.8% 2870.0 std::__1::array crypt<37ul> (std::__1::array const&)

1.0ms 0.0% 0.0 _dyld_start

Ok, so everything except crypt has now been inlined. If we apply the modification from @MrSmith42, we shave even more time off:

Running Time Self Symbol Name

1855.0ms 100.0% 0.0 Main Thread 0x2ca5c

1854.0ms 99.9% 0.0 start

1854.0ms 99.9% 1.0 main

1853.0ms 99.8% 1853.0 std::__1::array crypt<37ul> (std::__1::array const&)

1.0ms 0.0% 0.0 _dyld_start

We have imposed a fairly large constraint however: we need to know the size of the string to be encrypted at compile-time (so no reading it from a file or stdin or the like). If this will always be the case, then this provides a fairly significant speed-up on my hardware.

Question 4

How are you compiling this? On your version (g++ -std=c++11 enigma.cpp), runtime is 20.85s. On my version (g++ enigma.cpp) runtime is 11.63s?

Question 5

This is under clang. Are you compiling with -O2 or -O3?

Question 6

@StuR I suggest trying the two on an online compiler like coliru.stacked-crooked.com - and doing timings with that. I get a decrease from ~9.5s to about ~5.5s using the above changes.

Question 7

You provide the timing of the 4 different versions.

But that is useless without the code (if you wrote the perl version as badly as the C++ version then its not surprising you get bad results). Before times are useful for a comparison we need to make sure that the tests are comparable. So we really need the code for all four versions. Then we can get criticism of all four code bases and work to get them aligned to the best implementation of the appropriate languages. Once we have done that then we can do realistic timings.

Note 1:

Stop using C

 string crypt (const char *ct)

Should be:

 string crypt (std::string const& ct)

Note 2:

This allows a much needed speedup here:

for ( int x = 0; x < strlen(ct) ; x++ ) {

Should be:

for (char c : ct)
// or if using C++03
for (std::string::const_iterator c = ct.begin(); c != ct.end(); ++c)

This should improve performance considerably as you are not re-calculating string length all the time.

Note 3:

Always do timing after you have compiled the optimized version

g++ -O3 <stuff>

Note 4:

These values are const

int L = li(key[0]);
int M = li(key[1]);
int R = li(key[2]);

Try:

int const L = li(key[0]);
int const M = li(key[1]);
int const R = li(key[2]);

Note 5:

This looks decidedly inefficient:

int d = mod26(indexof(rotors[0], alpha[mod26(li(ref) + L)]) - L);

Looking at:

indexof(rotors[0], alpha[mod26(li(ref) + L)])
// indexof(<C-String> , <char>)
// Does a linear search of the string.
// That is very inefficient. Why not use std::unordered_map<char, int>.
// If you don't want to waste space use std::map<char, int>

Note 5:

You may think this is C++ but it's not.
This is a C implementation and not a good one.

The javascript version relies on an engine written in C++ that is highly optimized. If you don't apply the same techniques that were used by the javascript engine, then I am not surprised that you get similar results. I would expect the C++ version to run 100x faster (not 4x) than javascript.

Question 8

Linear searches on very short strings such as these will (most probably) outperform map lookups.

Question 9

Source code for the other versions are in my question, if you click on where it says for example "Javascript Version:".

Question 10

@Yuushi: Maybe (if they were shorter). But if you are going to make that claim you better be able to back it up. Did you time it? In my world O(ln(n)) is better than O(n) unless proved otherwise and with a bit of work you could probably make it O(1).

Question 11

Here you go: coliru.stacked-crooked.com/a/87669fd0d562983f - with just an upper-case alphabet, std::array is significantly faster.

Question 12

Used your code. Change I made (remove the srand()) as that is not useful when doing timing tests. On Mac Book Pro: 2.3 GHz Core i7. g++ --version == clang-500.2.79. Built using g++ -std=c++11 -O3 time.cpp: Ran each test three times. Array(75/78/77) unordered_set(74/78/78) set(75/74/78). All looks about the same to me.

Question 13

I think you can improve the 'Inverse rotor pass' a lot by hard-code the inverted rotors instead of searching the inverse function every time.

Your a is never below -26 so you could try to replace

long mod26(long a)
{
 return (a%26+26)%26;
}

by

long mod26(long a)
{
 return (a+26)%26;
}

But only profiling will tell you if / how much speed improvement that brings. Also you should always profile before trying to optimize. See were the bottleneck is.

Yuushi Yuushi 11.1k2 gold badges31 silver badges66 bronze badges · Accepted Answer · 2014-03-13 02:42:41Z

Let's profile this a little bit (this is on a dual-core i7, so a bit quicker):

Running Time Self Symbol Name

3984.0ms 100.0% 0.0 Main Thread 0x2856c

3982.0ms 99.9% 0.0 start

3982.0ms 99.9% 1.0 main

3936.0ms 98.7% 3081.0 crypt(char const*)

640.0ms 16.0% 640.0 _platform_strchr

126.0ms 3.1% 102.0 std::__1::basic_string, std::__1::allocator>::push_back(char)

89.0ms 2.2% 89.0 strlen

14.0ms 0.3% 14.0 DYLD-STUB$$strchr

14.0ms 0.3% 4.0 free

8.0ms 0.2% 1.0 szone_free_definite_size

7.0ms 0.1% 7.0 DYLD-STUB$$strlen

2.0ms 0.0% 2.0 operator delete(void*)

2.0ms 0.0% 0.0 _dyld_start

Ok, so more than 98% of the time is spent in the crypt function (which isn't surprising really). Almost 16% of the time is spent in strchr, with another 3.1% spent in push_back. There's also a few percent (2.2) spent doing strlen.

Ok, so how do we cut down on these:

Firstly, we'll switch over to using std::array<char> instead of char *. These are both stack-allocated (and so we don't incur the dynamic allocation overhead of std::string or std::vector), but std::array knows its size, and will allow us to remove the strlen in the crypt function:

#include <array>
const std::array<char, 27> alpha = {"ABCDEFGHIJKLMNOPQRSTUVWXYZ"};
const std::array<std::array<char, 27>, 3> rotors
{
 {{"EKMFLGDQVZNTOWYHXUSPAIBRCJ"},
 {"AJDKSIRUXBLHWTMCQGZNPYFVOE"},
 {"BDFHJLCPRTXVZNYEIWGAKMUSQO"}}
};
const std::array<char, 27> reflectors = {"YRUHQSLDPXNGOKMIEBFZCWVJAT"};
const std::array<char, 4> key = {"ABC"};

The indexof function then needs to change as well (and I'm going to rename it to index_of, which is easier to read):

template <size_t N>
std::size_t index_of (const std::array<char, N>& str, int find)
{
 for(std::size_t i = 0; i < N; ++i) {
 if(str[i] == find) return i;
 }
 return -1;
}

crypt now looks like:

template <size_t N>
std::array<char, N> crypt (const std::array<char, N>& ct)
{
 // Sets initial permutation
 int L = li(key[0]);
 int M = li(key[1]);
 int R = li(key[2]);
 std::array<char, N> output;
 for ( unsigned x = 0; x < N ; x++ ) {
 int ct_letter = li(ct[x]);
 // Step right rotor on every iteration
 R = static_cast<int>(mod26(R + 1));
 // Pass through rotors
 char a = rotors[2][mod26(R + ct_letter)];
 char b = rotors[1][mod26(M + li(a) - R)];
 char c = rotors[0][mod26(L + li(b) - M)];
 // Pass through reflector
 char ref = reflectors[mod26(li(c) - L)];
 // Inverse rotor pass
 long d = mod26(index_of(rotors[0], alpha[mod26(li(ref) + L)]) - L);
 long e = mod26(index_of(rotors[1], alpha[mod26(d + M)]) - M);
 char f = mod26(index_of(rotors[2], alpha[mod26(e + R)]) - R);
 output[x] = alpha[f];
 }
 return output;
}

which is close to what it was, but with a few minor changes (no += on a string, as we know already how large it will be and simple assign directly into that index).

And we now call it like so:

int main ()
{
 for ( int i = 0; i < 1000000; i++) {
 crypt(std::array<char, 37>{"PZUFWDSASJGQGNRMAEODZJXQQKHSYGVUSGSU"});
 }
 return 0;
}

Let's profile this again:

Running Time Self Symbol Name

2873.0ms 100.0% 0.0 Main Thread 0x2c7ff

2872.0ms 99.9% 0.0 start

2872.0ms 99.9% 2.0 main

2870.0ms 99.8% 2870.0 std::__1::array crypt<37ul> (std::__1::array const&)

1.0ms 0.0% 0.0 _dyld_start

Ok, so everything except crypt has now been inlined. If we apply the modification from @MrSmith42, we shave even more time off:

Running Time Self Symbol Name

1855.0ms 100.0% 0.0 Main Thread 0x2ca5c

1854.0ms 99.9% 0.0 start

1854.0ms 99.9% 1.0 main

1853.0ms 99.8% 1853.0 std::__1::array crypt<37ul> (std::__1::array const&)

1.0ms 0.0% 0.0 _dyld_start

We have imposed a fairly large constraint however: we need to know the size of the string to be encrypted at compile-time (so no reading it from a file or stdin or the like). If this will always be the case, then this provides a fairly significant speed-up on my hardware.

How are you compiling this? On your version (g++ -std=c++11 enigma.cpp), runtime is 20.85s. On my version (g++ enigma.cpp) runtime is 11.63s?
This is under clang. Are you compiling with -O2 or -O3?
@StuR I suggest trying the two on an online compiler like coliru.stacked-crooked.com - and doing timings with that. I get a decrease from ~9.5s to about ~5.5s using the above changes.

Stack Exchange Network

Enigma simulator performance

3 Answers 3

Note 1:

Note 2:

Note 3:

Note 4:

Note 5:

Note 5:

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Enigma simulator performance

3 Answers 3

Note 1:

Note 2:

Note 3:

Note 4:

Note 5:

Note 5:

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions