Enigma machine simulator

Question 1

As my first (keep that in mind!) Haskall program (full code at end) I've written a simple Enigma machine and would like feedback on the core code related to its stepping and encoding — stripped of comments to see how it fares without them. (I expect to post a follow on where a more complete package is reviewed; but this should stand on its own.)

This code allows for the creation of a machine from a simple specification:

ghci> let cfg = EnigmaConfig ["AE.BF.CM.DQ.HU.JN.LX.PR.SZ.VW","VIII","VI","V","β","c"] [1,25,16,15,25,1] [1,12,5,16,5,1]

which can then be represented in one of the conventional ways:

ghci> windows cfg
"CDTJ"

Examined more deeply:

ghci> putStr $ unlines $ stageMappingList cfg
EFMQABGUINKXCJORDPZTHWVLYS
IXHMSJVNZQEDLURFBTCOGYPKWA
COVLDIWTNEHUARGZFXQJBMPYSK
XJMIYVCARQOWHLNDSUFKGBEPZT
QUNGALXEPKZYRDSOFTVCMBIHWJ
RDOBJNTKVEHMLFCWZAXGYIPSUQ
EVTNHQDXWZJFUCPIAMORBSYGLK
HVGPWSUMDBTNCOKXJIQZRFLAEY
MUAEJQOKFTZDVIBWSNYHLCGRXP
ZQSLKPUCAFXMDHTWJOERNGYBVI
EFMQABGUINKXCJORDPZTHWVLYS

and used to encode messages:

ghci> let msg = "FOLGENDESISTSOFORTBEKANNTZUGEBEN"
ghci> let msg' = enigmaEncoding cfg msg
ghci> msg'
"RBBFPMHPHGCZXTDYGAHGUFXGEWKBLKGJ"
ghci> msg == enigmaEncoding cfg msg'
True

I'm especially interested in review of any errors or missed opportunities to exploit Haskell features or idioms. I'm also curious how clear the code — on its own, with out comments — is.

I'm also particularly interested review of following aspects:

Overall, this is a bit (perhaps considerably?) less efficient that many alternative approaches might be, because rather determining encoding based only on a minimal state specification, I determine the complete mapping of each component of the machine as part of my encoding calculations. Effectively, I determine what the encoding of every letter at every stage would be, and then use that to figure out how a given (single) letter is encoded. This also lets me compute encodings by rotating mappings, which corresponds directly to what the Enigma machine is doing physically.

Since my goal is to be educational, and not to create a cryptographic tool, this is acceptable to me; but comments about other approaches are welcome.

My type for the configuration, or state, of an Enigma machine, EmigmaConfig, will be unfamiliar to people who haven't though about the internals of how an Enigma works, and focuses on the minimal physical set of values that fully define it's state, rather than conventionally exposed things like the letters at the "windows".

This is deliberate. In the context of a larger package, this would serve as an "internal" representation (that would have an inaccessible value constructor) and I would provide a more user-friendly "safe" constructor that parsed conventional specification strings.

The expression used in componentMapping to determine the mapping preformed by a component c that is in a given position p, is a bit more verbose than it needs to be:

rotMap (1-p) letters !! (numA0 ch)) (rotMap (p-1) (wiring c))

but I was aiming to have something close the physical process by which rotating a component changes its mapping. This is (fortunately) close to the typical mathematical formulation, in which mappings are represented as a composition linear operators that permute the alphabet:

$$\mu_{c}(p)=\rho^{p-1}\omega_{c}\rho^{1-p}$$

where \$\mu_{c}(p)\$ is the mapping for the component \$c\$ in position \$p\,ドル \$\omega_{c}\$ is the mapping performed by the wiring of component \$c\$ (in position 1), and \$\rho^i\,ドル is the \$i\$-th cyclic permutation of the alphabet.

I'm not sure I succeeded at this.

My function for determining whether a component steps is a bit less concise than it could be, because I wanted to explicitly call out the different cases (which are often confused in explanations of how "stepping" works):

steppedPosition :: Stage -> Position
steppedPosition i = (mod (positions ec !! i + di - 1) 26) + 1
 where
 di | i == 0 = 0 
 | i > 3 = 0 
 | i == 1 = 1 
 | i == 2 && isTurn 2 = 1 
 | isTurn (i-1) = 1 
 | otherwise = 0
 isTurn :: Stage -> Bool
 isTurn j = elem (windowLetter ec j) (turnovers $ component (components ec !! j))

I'm mostly satisfied with this, but would be happy with something a bit more idiomatic.

I have several functions:

stageMappingList:: EnigmaConfig -> [Mapping]
enigmaMappingList :: EnigmaConfig -> [Mapping]

and:

enigmaMapping :: EnigmaConfig -> Mapping

that could be dispensed with if I were only concerned with encoding a message, but since a goal is to be able to examine the internal state of the machine, including the mapping performed by each component in every position, I need these, or something like them.

I think I've also made good use of these by exploiting them to build one another in ways that help illuminate how encoding happens, and to actually perform the encoding in enigmaEncoding.

I like the "punch like" set up by all these functions defining the function that computes the mapping performed by a machine configuration:

enigmaEncoding :: EnigmaConfig -> Message -> String
enigmaEncoding ec msg = zipWith encode (enigmaMapping <$> (iterate step (step ec))) msg

This maps nicely, in my view, to how the machine works.

(Note I've removed the assertion here, it isn't working anyway, a separate issue.)

Complete code of core functionality:

import Control.Arrow
import Control.Exception (assert)
import Data.Monoid
import Data.List
import Data.List.Split (splitOn)
import qualified Data.Map as M
import Data.Maybe
import Data.Char (chr, ord)
import Data.List (sort)
-- some generic things to get out of the way
letters :: String
letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
numA0 :: Char -> Int
numA0 ch = ord ch - ord 'A'
chrA0 :: Int -> Char
chrA0 i = chr (i + ord 'A')
ordering :: Ord a => [a] -> [Int]
ordering xs = snd <$> sort (zip xs [0..])
encode :: String -> Char -> Char
encode e ' ' = ' '
encode e ch = e !! (numA0 ch)
encode' :: String -> String -> String
encode' e s = (encode e) <$> s
type Name = String
type Wiring = Mapping
type Turnovers = String
data Component = Component {name :: !Name, wiring :: !Wiring, turnovers :: !Turnovers}
comps = M.fromList $ (name &&& id) <$> [
 Component "I" "EKMFLGDQVZNTOWYHXUSPAIBRCJ" "Q",
 Component "II" "AJDKSIRUXBLHWTMCQGZNPYFVOE" "E",
 Component "III" "BDFHJLCPRTXVZNYEIWGAKMUSQO" "V",
 Component "IV" "ESOVPZJAYQUIRHXLNFTGKDCMWB" "J",
 Component "V" "VZBRGITYUPSDNHLXAWMJQOFECK" "Z",
 Component "VI" "JPGVOUMFYQBENHZRDKASXLICTW" "ZM",
 Component "VII" "NZJHGRCXMYSWBOUFAIVLPEKQDT" "ZM",
 Component "VIII" "FKQHTLXOCBJSPDZRAMEWNIUYGV" "ZM",
 Component "β" "LEYJVCNIXWPBQMDRTAKZGFUHOS" "",
 Component "γ" "FSOKANUERHMBTIYCWLQPZXVGJD" "",
 Component "A" "EJMZALYXVBWFCRQUONTSPIKHGD" "",
 Component "B" "YRUHQSLDPXNGOKMIEBFZCWVJAT" "",
 Component "C" "FVPJIAOYEDRZXWGCTKUQSBNMHL" "",
 Component "b" "ENKQAUYWJICOPBLMDXZVFTHRGS" "",
 Component "c" "RDOBJNTKVEHMLFCWZAXGYIPSUQ" "",
 Component "" "ABCDEFGHIJKLMNOPQRSTUVWXYZ" ""]
component :: Name -> Component
component n = fromMaybe (Component n (foldr plug letters (splitOn "." n)) "") (M.lookup n comps)
 where
 c = find ((== n).name) comps
 plug [p1,p2] = map (\ch -> if ch == p1 then p2 else if ch == p2 then p1 else ch)
type Stage = Int
type Position = Int
data EnigmaConfig = EnigmaConfig {
 components :: ![Name],
 positions :: ![Position],
 rings :: ![Int]
 } deriving (Eq, Show)
stages :: EnigmaConfig -> [Stage]
stages ec = [0..(length $ components ec)-1]
windowLetter :: EnigmaConfig -> Stage -> Char
windowLetter ec st = chrA0 $ mod (positions ec !! st + rings ec !! st - 2) 26
step :: EnigmaConfig -> EnigmaConfig
step ec = EnigmaConfig {
 components = components ec,
 positions = steppedPosition <$> stages ec, 
 rings = rings ec
 }
 where
 -- not as concise as it could be in order to make cases explicit
 steppedPosition :: Stage -> Position
 steppedPosition i = (mod (positions ec !! i + di - 1) 26) + 1
 where
 di | i == 0 = 0 
 | i > 3 = 0 
 | i == 1 = 1 
 | i == 2 && isTurn 2 = 1 
 | isTurn (i-1) = 1 
 | otherwise = 0
 isTurn :: Stage -> Bool
 isTurn j = elem (windowLetter ec j) (turnovers $ component (components ec !! j))
windows :: EnigmaConfig -> String
windows ec = reverse $ tail.init $ windowLetter ec <$> (stages ec)
type Mapping = String
data Direction = Fwd | Rev
-- less compact than it could be in order to map closely to the math
componentMapping:: Direction -> Component -> Position -> Mapping
componentMapping d c p = case d of
 Fwd -> map (\ch -> rotMap (1-p) letters !! (numA0 ch)) (rotMap (p-1) (wiring c))
 Rev -> chrA0 <$> (ordering $ componentMapping Fwd c p)
 where
 rotMap :: Int -> Wiring -> Mapping
 rotMap o w = take 26 . drop (mod o 26) . cycle $ w
-- several functions here that could be dispensed with or combined, if just encoding was the goal
stageMappingList:: EnigmaConfig -> [Mapping]
stageMappingList ec = ((stageMapping Fwd) <$>) <> ((stageMapping Rev) <$>).tail.reverse $ stages ec
 where
 stageMapping :: Direction -> Stage -> Mapping
 stageMapping d sn = componentMapping d (component $ components ec !! sn) (positions ec !! sn)
enigmaMappingList :: EnigmaConfig -> [Mapping]
enigmaMappingList ec = scanl1 (flip encode') (stageMappingList ec)
enigmaMapping :: EnigmaConfig -> Mapping
enigmaMapping ec = last $ enigmaMappingList ec
type Message = String
enigmaEncoding :: EnigmaConfig -> Message -> String
enigmaEncoding ec msg = assert (and $ (`elem` letters) <$> msg) $
 zipWith encode (enigmaMapping <$> (iterate step (step ec))) msg

Question 2

step

This code:

step ec = EnigmaConfig {
 components = components ec,
 positions = steppedPosition <$> stages ec, 
 rings = rings ec
 }
 ...

may be written:

step ec = ec { positions = steppedPosition <$> stages ec }

Not only is it shorter, but it is also clearer that the returned value is the same same as the input ec except that the positions field has changed.

enigmaEncoding

This expression:

(and $ (`elem` letters) <$> msg)

may be more simply written as:

all (`elem` letters) msg

and is a lot more readable. In general I avoid using <$> on lists - just write map. To me, using fmap or <$> is a signal that a more complex structure is involved, like a monad. If it is just a list, it's better to write plainer code - it'll make it a lot easier for someone else to understand.

26

The number 26 appears as a magic constant. It should be given a symbolic name. Or, find a way to structure your code so that you don't need it.

subexpressions

I would try to give good names to sub-expressions in your program.

For instance, in this expression:

zipWith encode (enigmaMapping <$> (iterate step (step ec))) msg

it's hard to see what the two arguments are. Perhaps write it as:

zipWith encode states msg
 where states = ...

Another example - these expressions are similar:

windowLetter ec st = chrA0 $ mod (positions ec !! st + rings ec !! st - 2) 26
...
 steppedPosition i = (mod (positions ec !! i + di - 1) 26) + 1

I'm sure you can define sub-expressions which make it clearer what's going on.

architecture

This site has a nice picture of how an Enigma machine encodes a letter:

http://enigma.louisedade.co.uk/howitworks.html

enter image description here

Would it be possible to write your code to mimic this diagram? E.g.:

encodeLetter :: EnigmaConfig -> Char -> Char
encodeLetter ec = plug' . right' . middle' . left' . reflect .
 left . middle . right . static . plug
 where plug = ...
 static = ...
 ...

Someone who is reading your code would be looking for where these parts of the encoding process appear in your code, and structuring the code like this would make it obvious what's going on.

Question 3

Great suggestions. Regarding the final one, I'd have to change approaches: my enigmaEncoding (where letters are encoded) has already collapsed the encoding of the whole machine into a single mapping. It's through the functions that precede it that I examine/expose the stages. But I'll see how such an approach might work. Great improvement for step!

Question 4

Note also that roughly what yo propose for encodeLetter happens in stageMappingList with ((stageMapping Fwd) <$>) <> ((stageMapping Rev) <$>).tail.reverse $ stages ec. So there would be the place to make the correspondence to the physical operation clearer.

Question 5

honestly that is a really difficult expression to grok! see the encodeLetter function in my gist in my other answer for another approach.

Question 6

That's funny. I've been wrapping my head around Functors and Monoids and Applicatives and that expression (now) seems super clear to me, and maps directly the math. I bet in a week when I forget what I've learned it will make no sense to me at all (we'll see)!

Question 7

I'm adding another answer because I've found more to say.

Use `_` for field names

Prefix field names with an underscore. It makes the code easier to read because it's a signal to the reader that a field is being accessed instead of general function being called.

Other languages use . for field accessors, and that makes expressions like self.count += 1 more readable.

limited set of wheels

Your implementation is limited to the wheels which are listed in comps. To add a new component you have to change the definition of comps.

A better approach is to create a Wheel data structure, and add a function which creates an EnigmaConfig from a collection of wheels:

data Wheel = Wheel ...
makeEnigma :: [Wheel] -> EnigmaConfig
compI = makeWheel "EKMFLGDQVZNTOWYHXUSPAIBRCJ" "Q"
compII = makeWheel "AJDKSIRUXBLHWTMCQGZNPYFVOE" "E"
...
compVIII = makeWheel ...
cfg = makeEnigma [ compVIII, compVI, compV ]

Now you can add more wheels without modifying existing code - you just add new definitions.

It's also more type safe. If wheels are just strings, the compiler can't catch misspelled component names, e.g. in:

cfg = EnigmaConfig [ ..., "compi", ... ]

the string "compi" will be interpreted as a plug-board wiring.

too much !!

For one thing !! is inefficient on lists, but additionally you are iterating over indices k and writing expressions like:

positions ec !! k
components ec !! k
rings ec !! k

Instead, iterate over the actual values themselves, e.g.:

map go $ zip (components ec) (positions ec)
 where go :: (Component,Int) -> ...
 go (comp,pos) = ...

Now it is clear that the go function only depends on the Component and position.

think about testing

Think about what is important to test, and expose those functions at the top level so you can tet them individually.

You should be building up the functionality in layers so that you can throughly test one layer before using it to build the next layer.

My Approach

I've writen up my own implementation to this problem at:

https://gist.github.com/erantapaa/f071bc3f58d017f9280a

It's a Literate Haskell file so you can use it directly with ghc and ghci.

Note - it is currently untested.

Question 8

Can you say more about what you mean by "field names"? In my view, Haskell has data and functions; the OO concepts of field and method aren't needed. Whether a function is implemented as a field or not is an implementation detail. This is true generally, and here is very much the case (since, e.g., I could have had, say windows be a record field and computed, either rings or positions, given one or the other without changing anything else).

Question 9

Good point about the !!. I need to fix that for sure.

Question 10

Haskell has records, and the components of records are called fields - i.e. see (link). Almost every type in Haskell - except for the basic ones like Int, Char, etc. - are records. While it is true that the accessors are just functions, having the underscore in front of it helps the reader understand what's going on, and that's why you should use it.

Question 11

Sometimes, but not in this case, imv. There's a lot of OO cruft that I feel like Haskell has the potential to wash away. Sometimes that isn't desirable; but what matters here is happening a level up. Again: I could decide to rework EnigmaConfig so that windows was a field, allowing me to dispense with either rings or positions as a field. There's no reason I should have to change a name when I do that.

ErikR ErikR 1,4068 silver badges6 bronze badges · Answer 1 · 2015-09-26 22:06:50Z

step

This code:

step ec = EnigmaConfig {
 components = components ec,
 positions = steppedPosition <$> stages ec, 
 rings = rings ec
 }
 ...

may be written:

step ec = ec { positions = steppedPosition <$> stages ec }

Not only is it shorter, but it is also clearer that the returned value is the same same as the input ec except that the positions field has changed.

enigmaEncoding

This expression:

(and $ (`elem` letters) <$> msg)

may be more simply written as:

all (`elem` letters) msg

and is a lot more readable. In general I avoid using <$> on lists - just write map. To me, using fmap or <$> is a signal that a more complex structure is involved, like a monad. If it is just a list, it's better to write plainer code - it'll make it a lot easier for someone else to understand.

26

The number 26 appears as a magic constant. It should be given a symbolic name. Or, find a way to structure your code so that you don't need it.

subexpressions

I would try to give good names to sub-expressions in your program.

For instance, in this expression:

zipWith encode (enigmaMapping <$> (iterate step (step ec))) msg

it's hard to see what the two arguments are. Perhaps write it as:

zipWith encode states msg
 where states = ...

Another example - these expressions are similar:

windowLetter ec st = chrA0 $ mod (positions ec !! st + rings ec !! st - 2) 26
...
 steppedPosition i = (mod (positions ec !! i + di - 1) 26) + 1

I'm sure you can define sub-expressions which make it clearer what's going on.

architecture

This site has a nice picture of how an Enigma machine encodes a letter:

http://enigma.louisedade.co.uk/howitworks.html

enter image description here

Would it be possible to write your code to mimic this diagram? E.g.:

encodeLetter :: EnigmaConfig -> Char -> Char
encodeLetter ec = plug' . right' . middle' . left' . reflect .
 left . middle . right . static . plug
 where plug = ...
 static = ...
 ...

Someone who is reading your code would be looking for where these parts of the encoding process appear in your code, and structuring the code like this would make it obvious what's going on.

Great suggestions. Regarding the final one, I'd have to change approaches: my enigmaEncoding (where letters are encoded) has already collapsed the encoding of the whole machine into a single mapping. It's through the functions that precede it that I examine/expose the stages. But I'll see how such an approach might work. Great improvement for step!
Note also that roughly what yo propose for encodeLetter happens in stageMappingList with ((stageMapping Fwd) <$>) <> ((stageMapping Rev) <$>).tail.reverse $ stages ec. So there would be the place to make the correspondence to the physical operation clearer.
honestly that is a really difficult expression to grok! see the encodeLetter function in my gist in my other answer for another approach.
That's funny. I've been wrapping my head around Functors and Monoids and Applicatives and that expression (now) seems super clear to me, and maps directly the math. I bet in a week when I forget what I've learned it will make no sense to me at all (we'll see)!

ErikR ErikR 1,4068 silver badges6 bronze badges · Answer 2 · 2015-09-27 21:12:17Z

I'm adding another answer because I've found more to say.

Use `_` for field names

Prefix field names with an underscore. It makes the code easier to read because it's a signal to the reader that a field is being accessed instead of general function being called.

Other languages use . for field accessors, and that makes expressions like self.count += 1 more readable.

limited set of wheels

Your implementation is limited to the wheels which are listed in comps. To add a new component you have to change the definition of comps.

A better approach is to create a Wheel data structure, and add a function which creates an EnigmaConfig from a collection of wheels:

data Wheel = Wheel ...
makeEnigma :: [Wheel] -> EnigmaConfig
compI = makeWheel "EKMFLGDQVZNTOWYHXUSPAIBRCJ" "Q"
compII = makeWheel "AJDKSIRUXBLHWTMCQGZNPYFVOE" "E"
...
compVIII = makeWheel ...
cfg = makeEnigma [ compVIII, compVI, compV ]

Now you can add more wheels without modifying existing code - you just add new definitions.

It's also more type safe. If wheels are just strings, the compiler can't catch misspelled component names, e.g. in:

cfg = EnigmaConfig [ ..., "compi", ... ]

the string "compi" will be interpreted as a plug-board wiring.

too much !!

For one thing !! is inefficient on lists, but additionally you are iterating over indices k and writing expressions like:

positions ec !! k
components ec !! k
rings ec !! k

Instead, iterate over the actual values themselves, e.g.:

map go $ zip (components ec) (positions ec)
 where go :: (Component,Int) -> ...
 go (comp,pos) = ...

Now it is clear that the go function only depends on the Component and position.

think about testing

Think about what is important to test, and expose those functions at the top level so you can tet them individually.

You should be building up the functionality in layers so that you can throughly test one layer before using it to build the next layer.

My Approach

I've writen up my own implementation to this problem at:

https://gist.github.com/erantapaa/f071bc3f58d017f9280a

It's a Literate Haskell file so you can use it directly with ghc and ghci.

Note - it is currently untested.

Can you say more about what you mean by "field names"? In my view, Haskell has data and functions; the OO concepts of field and method aren't needed. Whether a function is implemented as a field or not is an implementation detail. This is true generally, and here is very much the case (since, e.g., I could have had, say windows be a record field and computed, either rings or positions, given one or the other without changing anything else).
Haskell has records, and the components of records are called fields - i.e. see (link). Almost every type in Haskell - except for the basic ones like Int, Char, etc. - are records. While it is true that the accessors are just functions, having the underscore in front of it helps the reader understand what's going on, and that's why you should use it.
Sometimes, but not in this case, imv. There's a lot of OO cruft that I feel like Haskell has the potential to wash away. Sometimes that isn't desirable; but what matters here is happening a level up. Again: I could decide to rework EnigmaConfig so that windows was a field, allowing me to dispense with either rings or positions as a field. There's no reason I should have to change a name when I do that.

Stack Exchange Network

Enigma machine simulator

2 Answers 2

step

enigmaEncoding

26

subexpressions

architecture

Use `_` for field names

limited set of wheels

too much !!

think about testing

My Approach

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Enigma machine simulator

2 Answers 2

step

enigmaEncoding

26

subexpressions

architecture

Use _ for field names

limited set of wheels

too much !!

think about testing

My Approach

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Use `_` for field names