Here is my implementation of a simple 3 rotor Enigma machine in C++:
#include <iostream>
#include <cstring>
using namespace std;
char alpha[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
char rotors[3][27] =
{
"EKMFLGDQVZNTOWYHXUSPAIBRCJ",
"AJDKSIRUXBLHWTMCQGZNPYFVOE",
"BDFHJLCPRTXVZNYEIWGAKMUSQO"
};
char reflector[] = "YRUHQSLDPXNGOKMIEBFZCWVJAT";
char key[] = "ABC";
long mod26(long a)
{
return (a%26+26)%26;
}
int li (char l)
{
// Letter index
return l - 'A';
}
int indexof (char* array, int find)
{
return strchr(array, find) - array;
}
string crypt (const char *ct)
{
// Sets initial permutation
int L = li(key[0]);
int M = li(key[1]);
int R = li(key[2]);
string output;
for ( int x = 0; x < strlen(ct) ; x++ ) {
int ct_letter = li(ct[x]);
// Step right rotor on every iteration
R = mod26(R + 1);
// Pass through rotors
char a = rotors[2][mod26(R + ct_letter)];
char b = rotors[1][mod26(M + li(a) - R)];
char c = rotors[0][mod26(L + li(b) - M)];
// Pass through reflector
char ref = reflector[mod26(li(c) - L)];
// Inverse rotor pass
int d = mod26(indexof(rotors[0], alpha[mod26(li(ref) + L)]) - L);
int e = mod26(indexof(rotors[1], alpha[mod26(d + M)]) - M);
char f = alpha[mod26(indexof(rotors[2], alpha[mod26(e + R)]) - R)];
output += f;
}
return output;
}
int main ()
{
for ( int i = 0; i < 1000000; i++) {
crypt ("PZUFWDSASJGQGNRMAEODZJXQQKHSYGVUSGSU");
}
return 0;
}
Benchmark ran on a 2.66ghz Core 2 Duo for 1 million decrypts:
11.64s user 0.02s system 97% cpu 11.963 total
I'm not a C++ programmer, I first wrote this in Javascript and then translated it into C++.
Whilst 1 million in ~10 seconds is OK, I feel that my implementation isn't optimal. Are there any optimisations, or micro optimisations that you could recommend to my C++ code to make it faster? Maybe I should try a different language? Any suggestions are welcome.
-
\$\begingroup\$ Which is faster: int, long, boolean, or double make your constants constant - saves time \$\endgroup\$user138042– user1380422017年05月04日 18:49:16 +00:00Commented May 4, 2017 at 18:49
3 Answers 3
Let's profile this a little bit (this is on a dual-core i7, so a bit quicker):
Running Time Self Symbol Name
3984.0ms 100.0% 0.0 Main Thread 0x2856c
3982.0ms 99.9% 0.0 start
3982.0ms 99.9% 1.0 main
3936.0ms 98.7% 3081.0 crypt(char const*)
640.0ms 16.0% 640.0 _platform_strchr
126.0ms 3.1% 102.0 std::__1::basic_string, std::__1::allocator>::push_back(char)
89.0ms 2.2% 89.0 strlen
14.0ms 0.3% 14.0 DYLD-STUB$$strchr
14.0ms 0.3% 4.0 free
8.0ms 0.2% 1.0 szone_free_definite_size
7.0ms 0.1% 7.0 DYLD-STUB$$strlen
2.0ms 0.0% 2.0 operator delete(void*)
2.0ms 0.0% 0.0 _dyld_start
Ok, so more than 98% of the time is spent in the crypt
function (which isn't surprising really). Almost 16% of the time is spent in strchr
, with another 3.1% spent in push_back
. There's also a few percent (2.2) spent doing strlen
.
Ok, so how do we cut down on these:
Firstly, we'll switch over to using std::array<char>
instead of char *
. These are both stack-allocated (and so we don't incur the dynamic allocation overhead of std::string
or std::vector
), but std::array
knows its size, and will allow us to remove the strlen
in the crypt function:
#include <array>
const std::array<char, 27> alpha = {"ABCDEFGHIJKLMNOPQRSTUVWXYZ"};
const std::array<std::array<char, 27>, 3> rotors
{
{{"EKMFLGDQVZNTOWYHXUSPAIBRCJ"},
{"AJDKSIRUXBLHWTMCQGZNPYFVOE"},
{"BDFHJLCPRTXVZNYEIWGAKMUSQO"}}
};
const std::array<char, 27> reflectors = {"YRUHQSLDPXNGOKMIEBFZCWVJAT"};
const std::array<char, 4> key = {"ABC"};
The indexof
function then needs to change as well (and I'm going to rename it to index_of
, which is easier to read):
template <size_t N>
std::size_t index_of (const std::array<char, N>& str, int find)
{
for(std::size_t i = 0; i < N; ++i) {
if(str[i] == find) return i;
}
return -1;
}
crypt
now looks like:
template <size_t N>
std::array<char, N> crypt (const std::array<char, N>& ct)
{
// Sets initial permutation
int L = li(key[0]);
int M = li(key[1]);
int R = li(key[2]);
std::array<char, N> output;
for ( unsigned x = 0; x < N ; x++ ) {
int ct_letter = li(ct[x]);
// Step right rotor on every iteration
R = static_cast<int>(mod26(R + 1));
// Pass through rotors
char a = rotors[2][mod26(R + ct_letter)];
char b = rotors[1][mod26(M + li(a) - R)];
char c = rotors[0][mod26(L + li(b) - M)];
// Pass through reflector
char ref = reflectors[mod26(li(c) - L)];
// Inverse rotor pass
long d = mod26(index_of(rotors[0], alpha[mod26(li(ref) + L)]) - L);
long e = mod26(index_of(rotors[1], alpha[mod26(d + M)]) - M);
char f = mod26(index_of(rotors[2], alpha[mod26(e + R)]) - R);
output[x] = alpha[f];
}
return output;
}
which is close to what it was, but with a few minor changes (no +=
on a string
, as we know already how large it will be and simple assign directly into that index).
And we now call it like so:
int main ()
{
for ( int i = 0; i < 1000000; i++) {
crypt(std::array<char, 37>{"PZUFWDSASJGQGNRMAEODZJXQQKHSYGVUSGSU"});
}
return 0;
}
Let's profile this again:
Running Time Self Symbol Name
2873.0ms 100.0% 0.0 Main Thread 0x2c7ff
2872.0ms 99.9% 0.0 start
2872.0ms 99.9% 2.0 main
2870.0ms 99.8% 2870.0 std::__1::array crypt<37ul> (std::__1::array const&)
1.0ms 0.0% 0.0 _dyld_start
Ok, so everything except crypt
has now been inlined. If we apply the modification from @MrSmith42, we shave even more time off:
Running Time Self Symbol Name
1855.0ms 100.0% 0.0 Main Thread 0x2ca5c
1854.0ms 99.9% 0.0 start
1854.0ms 99.9% 1.0 main
1853.0ms 99.8% 1853.0 std::__1::array crypt<37ul> (std::__1::array const&)
1.0ms 0.0% 0.0 _dyld_start
We have imposed a fairly large constraint however: we need to know the size of the string to be encrypted at compile-time (so no reading it from a file or stdin
or the like). If this will always be the case, then this provides a fairly significant speed-up on my hardware.
-
\$\begingroup\$ How are you compiling this? On your version (g++ -std=c++11 enigma.cpp), runtime is 20.85s. On my version (g++ enigma.cpp) runtime is 11.63s? \$\endgroup\$StuR– StuR2014年03月13日 15:19:40 +00:00Commented Mar 13, 2014 at 15:19
-
\$\begingroup\$ This is under
clang
. Are you compiling with-O2
or-O3
? \$\endgroup\$Yuushi– Yuushi2014年03月13日 23:58:24 +00:00Commented Mar 13, 2014 at 23:58 -
1\$\begingroup\$ @StuR I suggest trying the two on an online compiler like coliru.stacked-crooked.com - and doing timings with that. I get a decrease from ~9.5s to about ~5.5s using the above changes. \$\endgroup\$Yuushi– Yuushi2014年03月14日 02:25:17 +00:00Commented Mar 14, 2014 at 2:25
You provide the timing of the 4 different versions.
But that is useless without the code (if you wrote the perl version as badly as the C++ version then its not surprising you get bad results). Before times are useful for a comparison we need to make sure that the tests are comparable. So we really need the code for all four versions. Then we can get criticism of all four code bases and work to get them aligned to the best implementation of the appropriate languages. Once we have done that then we can do realistic timings.
Note 1:
Stop using C
string crypt (const char *ct)
Should be:
string crypt (std::string const& ct)
Note 2:
This allows a much needed speedup here:
for ( int x = 0; x < strlen(ct) ; x++ ) {
Should be:
for (char c : ct)
// or if using C++03
for (std::string::const_iterator c = ct.begin(); c != ct.end(); ++c)
This should improve performance considerably as you are not re-calculating string length all the time.
Note 3:
Always do timing after you have compiled the optimized version
g++ -O3 <stuff>
Note 4:
These values are const
int L = li(key[0]);
int M = li(key[1]);
int R = li(key[2]);
Try:
int const L = li(key[0]);
int const M = li(key[1]);
int const R = li(key[2]);
Note 5:
This looks decidedly inefficient:
int d = mod26(indexof(rotors[0], alpha[mod26(li(ref) + L)]) - L);
Looking at:
indexof(rotors[0], alpha[mod26(li(ref) + L)])
// indexof(<C-String> , <char>)
// Does a linear search of the string.
// That is very inefficient. Why not use std::unordered_map<char, int>.
// If you don't want to waste space use std::map<char, int>
Note 5:
You may think this is C++ but it's not.
This is a C implementation and not a good one.
The javascript version relies on an engine written in C++ that is highly optimized. If you don't apply the same techniques that were used by the javascript engine, then I am not surprised that you get similar results. I would expect the C++ version to run 100x faster (not 4x) than javascript.
-
\$\begingroup\$ Linear searches on very short strings such as these will (most probably) outperform
map
lookups. \$\endgroup\$Yuushi– Yuushi2014年03月13日 05:48:35 +00:00Commented Mar 13, 2014 at 5:48 -
\$\begingroup\$ Source code for the other versions are in my question, if you click on where it says for example "Javascript Version:". \$\endgroup\$StuR– StuR2014年03月13日 11:08:41 +00:00Commented Mar 13, 2014 at 11:08
-
1\$\begingroup\$ @Yuushi: Maybe (if they were shorter). But if you are going to make that claim you better be able to back it up. Did you time it? In my world O(ln(n)) is better than O(n) unless proved otherwise and with a bit of work you could probably make it O(1). \$\endgroup\$Loki Astari– Loki Astari2014年03月13日 14:34:41 +00:00Commented Mar 13, 2014 at 14:34
-
\$\begingroup\$ Here you go: coliru.stacked-crooked.com/a/87669fd0d562983f - with just an upper-case alphabet,
std::array
is significantly faster. \$\endgroup\$Yuushi– Yuushi2014年03月14日 02:18:27 +00:00Commented Mar 14, 2014 at 2:18 -
\$\begingroup\$ Used your code. Change I made (remove the
srand()
) as that is not useful when doing timing tests. On Mac Book Pro: 2.3 GHz Core i7.g++ --version == clang-500.2.79
. Built usingg++ -std=c++11 -O3 time.cpp
: Ran each test three times. Array(75/78/77) unordered_set(74/78/78) set(75/74/78). All looks about the same to me. \$\endgroup\$Loki Astari– Loki Astari2014年03月14日 03:58:38 +00:00Commented Mar 14, 2014 at 3:58
I think you can improve the 'Inverse rotor pass' a lot by hard-code the inverted rotors instead of searching the inverse function every time.
Your a
is never below -26
so you could try to replace
long mod26(long a)
{
return (a%26+26)%26;
}
by
long mod26(long a)
{
return (a+26)%26;
}
But only profiling will tell you if / how much speed improvement that brings. Also you should always profile before trying to optimize. See were the bottleneck is.
Explore related questions
See similar questions with these tags.