As my first (keep that in mind!) Haskall program (full code at end) I've written a simple Enigma machine and would like feedback on the core code related to its stepping and encoding — stripped of comments to see how it fares without them. (I expect to post a follow on where a more complete package is reviewed; but this should stand on its own.)
This code allows for the creation of a machine from a simple specification:
ghci> let cfg = EnigmaConfig ["AE.BF.CM.DQ.HU.JN.LX.PR.SZ.VW","VIII","VI","V","β","c"] [1,25,16,15,25,1] [1,12,5,16,5,1]
which can then be represented in one of the conventional ways:
ghci> windows cfg "CDTJ"
Examined more deeply:
ghci> putStr $ unlines $ stageMappingList cfg EFMQABGUINKXCJORDPZTHWVLYS IXHMSJVNZQEDLURFBTCOGYPKWA COVLDIWTNEHUARGZFXQJBMPYSK XJMIYVCARQOWHLNDSUFKGBEPZT QUNGALXEPKZYRDSOFTVCMBIHWJ RDOBJNTKVEHMLFCWZAXGYIPSUQ EVTNHQDXWZJFUCPIAMORBSYGLK HVGPWSUMDBTNCOKXJIQZRFLAEY MUAEJQOKFTZDVIBWSNYHLCGRXP ZQSLKPUCAFXMDHTWJOERNGYBVI EFMQABGUINKXCJORDPZTHWVLYS
and used to encode messages:
ghci> let msg = "FOLGENDESISTSOFORTBEKANNTZUGEBEN" ghci> let msg' = enigmaEncoding cfg msg ghci> msg' "RBBFPMHPHGCZXTDYGAHGUFXGEWKBLKGJ" ghci> msg == enigmaEncoding cfg msg' True
I'm especially interested in review of any errors or missed opportunities to exploit Haskell features or idioms. I'm also curious how clear the code — on its own, with out comments — is.
I'm also particularly interested review of following aspects:
Overall, this is a bit (perhaps considerably?) less efficient that many alternative approaches might be, because rather determining encoding based only on a minimal state specification, I determine the complete mapping of each component of the machine as part of my encoding calculations. Effectively, I determine what the encoding of every letter at every stage would be, and then use that to figure out how a given (single) letter is encoded. This also lets me compute encodings by rotating mappings, which corresponds directly to what the Enigma machine is doing physically.
Since my goal is to be educational, and not to create a cryptographic tool, this is acceptable to me; but comments about other approaches are welcome.
My type for the configuration, or state, of an Enigma machine, EmigmaConfig
, will be unfamiliar to people who haven't though about the internals of how an Enigma works, and focuses on the minimal physical set of values that fully define it's state, rather than conventionally exposed things like the letters at the "windows".
This is deliberate. In the context of a larger package, this would serve as an "internal" representation (that would have an inaccessible value constructor) and I would provide a more user-friendly "safe" constructor that parsed conventional specification strings.
The expression used in componentMapping
to determine the mapping preformed by a component c
that is in a given position p
, is a bit more verbose than it needs to be:
rotMap (1-p) letters !! (numA0 ch)) (rotMap (p-1) (wiring c))
but I was aiming to have something close the physical process by which rotating a component changes its mapping. This is (fortunately) close to the typical mathematical formulation, in which mappings are represented as a composition linear operators that permute the alphabet:
$$\mu_{c}(p)=\rho^{p-1}\omega_{c}\rho^{1-p}$$
where \$\mu_{c}(p)\$ is the mapping for the component \$c\$ in position \$p\,ドル \$\omega_{c}\$ is the mapping performed by the wiring of component \$c\$ (in position 1), and \$\rho^i\,ドル is the \$i\$-th cyclic permutation of the alphabet.
I'm not sure I succeeded at this.
My function for determining whether a component steps is a bit less concise than it could be, because I wanted to explicitly call out the different cases (which are often confused in explanations of how "stepping" works):
steppedPosition :: Stage -> Position
steppedPosition i = (mod (positions ec !! i + di - 1) 26) + 1
where
di | i == 0 = 0
| i > 3 = 0
| i == 1 = 1
| i == 2 && isTurn 2 = 1
| isTurn (i-1) = 1
| otherwise = 0
isTurn :: Stage -> Bool
isTurn j = elem (windowLetter ec j) (turnovers $ component (components ec !! j))
I'm mostly satisfied with this, but would be happy with something a bit more idiomatic.
I have several functions:
stageMappingList:: EnigmaConfig -> [Mapping]
enigmaMappingList :: EnigmaConfig -> [Mapping]
and:
enigmaMapping :: EnigmaConfig -> Mapping
that could be dispensed with if I were only concerned with encoding a message, but since a goal is to be able to examine the internal state of the machine, including the mapping performed by each component in every position, I need these, or something like them.
I think I've also made good use of these by exploiting them to build one another in ways that help illuminate how encoding happens, and to actually perform the encoding in enigmaEncoding
.
I like the "punch like" set up by all these functions defining the function that computes the mapping performed by a machine configuration:
enigmaEncoding :: EnigmaConfig -> Message -> String
enigmaEncoding ec msg = zipWith encode (enigmaMapping <$> (iterate step (step ec))) msg
This maps nicely, in my view, to how the machine works.
(Note I've removed the assertion here, it isn't working anyway, a separate issue.)
Complete code of core functionality:
import Control.Arrow
import Control.Exception (assert)
import Data.Monoid
import Data.List
import Data.List.Split (splitOn)
import qualified Data.Map as M
import Data.Maybe
import Data.Char (chr, ord)
import Data.List (sort)
-- some generic things to get out of the way
letters :: String
letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
numA0 :: Char -> Int
numA0 ch = ord ch - ord 'A'
chrA0 :: Int -> Char
chrA0 i = chr (i + ord 'A')
ordering :: Ord a => [a] -> [Int]
ordering xs = snd <$> sort (zip xs [0..])
encode :: String -> Char -> Char
encode e ' ' = ' '
encode e ch = e !! (numA0 ch)
encode' :: String -> String -> String
encode' e s = (encode e) <$> s
type Name = String
type Wiring = Mapping
type Turnovers = String
data Component = Component {name :: !Name, wiring :: !Wiring, turnovers :: !Turnovers}
comps = M.fromList $ (name &&& id) <$> [
Component "I" "EKMFLGDQVZNTOWYHXUSPAIBRCJ" "Q",
Component "II" "AJDKSIRUXBLHWTMCQGZNPYFVOE" "E",
Component "III" "BDFHJLCPRTXVZNYEIWGAKMUSQO" "V",
Component "IV" "ESOVPZJAYQUIRHXLNFTGKDCMWB" "J",
Component "V" "VZBRGITYUPSDNHLXAWMJQOFECK" "Z",
Component "VI" "JPGVOUMFYQBENHZRDKASXLICTW" "ZM",
Component "VII" "NZJHGRCXMYSWBOUFAIVLPEKQDT" "ZM",
Component "VIII" "FKQHTLXOCBJSPDZRAMEWNIUYGV" "ZM",
Component "β" "LEYJVCNIXWPBQMDRTAKZGFUHOS" "",
Component "γ" "FSOKANUERHMBTIYCWLQPZXVGJD" "",
Component "A" "EJMZALYXVBWFCRQUONTSPIKHGD" "",
Component "B" "YRUHQSLDPXNGOKMIEBFZCWVJAT" "",
Component "C" "FVPJIAOYEDRZXWGCTKUQSBNMHL" "",
Component "b" "ENKQAUYWJICOPBLMDXZVFTHRGS" "",
Component "c" "RDOBJNTKVEHMLFCWZAXGYIPSUQ" "",
Component "" "ABCDEFGHIJKLMNOPQRSTUVWXYZ" ""]
component :: Name -> Component
component n = fromMaybe (Component n (foldr plug letters (splitOn "." n)) "") (M.lookup n comps)
where
c = find ((== n).name) comps
plug [p1,p2] = map (\ch -> if ch == p1 then p2 else if ch == p2 then p1 else ch)
type Stage = Int
type Position = Int
data EnigmaConfig = EnigmaConfig {
components :: ![Name],
positions :: ![Position],
rings :: ![Int]
} deriving (Eq, Show)
stages :: EnigmaConfig -> [Stage]
stages ec = [0..(length $ components ec)-1]
windowLetter :: EnigmaConfig -> Stage -> Char
windowLetter ec st = chrA0 $ mod (positions ec !! st + rings ec !! st - 2) 26
step :: EnigmaConfig -> EnigmaConfig
step ec = EnigmaConfig {
components = components ec,
positions = steppedPosition <$> stages ec,
rings = rings ec
}
where
-- not as concise as it could be in order to make cases explicit
steppedPosition :: Stage -> Position
steppedPosition i = (mod (positions ec !! i + di - 1) 26) + 1
where
di | i == 0 = 0
| i > 3 = 0
| i == 1 = 1
| i == 2 && isTurn 2 = 1
| isTurn (i-1) = 1
| otherwise = 0
isTurn :: Stage -> Bool
isTurn j = elem (windowLetter ec j) (turnovers $ component (components ec !! j))
windows :: EnigmaConfig -> String
windows ec = reverse $ tail.init $ windowLetter ec <$> (stages ec)
type Mapping = String
data Direction = Fwd | Rev
-- less compact than it could be in order to map closely to the math
componentMapping:: Direction -> Component -> Position -> Mapping
componentMapping d c p = case d of
Fwd -> map (\ch -> rotMap (1-p) letters !! (numA0 ch)) (rotMap (p-1) (wiring c))
Rev -> chrA0 <$> (ordering $ componentMapping Fwd c p)
where
rotMap :: Int -> Wiring -> Mapping
rotMap o w = take 26 . drop (mod o 26) . cycle $ w
-- several functions here that could be dispensed with or combined, if just encoding was the goal
stageMappingList:: EnigmaConfig -> [Mapping]
stageMappingList ec = ((stageMapping Fwd) <$>) <> ((stageMapping Rev) <$>).tail.reverse $ stages ec
where
stageMapping :: Direction -> Stage -> Mapping
stageMapping d sn = componentMapping d (component $ components ec !! sn) (positions ec !! sn)
enigmaMappingList :: EnigmaConfig -> [Mapping]
enigmaMappingList ec = scanl1 (flip encode') (stageMappingList ec)
enigmaMapping :: EnigmaConfig -> Mapping
enigmaMapping ec = last $ enigmaMappingList ec
type Message = String
enigmaEncoding :: EnigmaConfig -> Message -> String
enigmaEncoding ec msg = assert (and $ (`elem` letters) <$> msg) $
zipWith encode (enigmaMapping <$> (iterate step (step ec))) msg
2 Answers 2
step
This code:
step ec = EnigmaConfig {
components = components ec,
positions = steppedPosition <$> stages ec,
rings = rings ec
}
...
may be written:
step ec = ec { positions = steppedPosition <$> stages ec }
Not only is it shorter, but it is also clearer that the returned value is
the same same as the input ec
except that the positions field has changed.
enigmaEncoding
This expression:
(and $ (`elem` letters) <$> msg)
may be more simply written as:
all (`elem` letters) msg
and is a lot more readable. In general I avoid using <$>
on lists - just write map
. To me, using fmap
or <$>
is a signal that
a more complex structure is involved, like a monad. If it is just a list, it's better to write plainer code - it'll make it a lot easier for someone else to understand.
26
The number 26 appears as a magic constant. It should be given a symbolic name. Or, find a way to structure your code so that you don't need it.
subexpressions
I would try to give good names to sub-expressions in your program.
For instance, in this expression:
zipWith encode (enigmaMapping <$> (iterate step (step ec))) msg
it's hard to see what the two arguments are. Perhaps write it as:
zipWith encode states msg
where states = ...
Another example - these expressions are similar:
windowLetter ec st = chrA0 $ mod (positions ec !! st + rings ec !! st - 2) 26
...
steppedPosition i = (mod (positions ec !! i + di - 1) 26) + 1
I'm sure you can define sub-expressions which make it clearer what's going on.
architecture
This site has a nice picture of how an Enigma machine encodes a letter:
http://enigma.louisedade.co.uk/howitworks.html
Would it be possible to write your code to mimic this diagram? E.g.:
encodeLetter :: EnigmaConfig -> Char -> Char
encodeLetter ec = plug' . right' . middle' . left' . reflect .
left . middle . right . static . plug
where plug = ...
static = ...
...
Someone who is reading your code would be looking for where these parts of the encoding process appear in your code, and structuring the code like this would make it obvious what's going on.
-
\$\begingroup\$ Great suggestions. Regarding the final one, I'd have to change approaches: my
enigmaEncoding
(where letters are encoded) has already collapsed the encoding of the whole machine into a single mapping. It's through the functions that precede it that I examine/expose the stages. But I'll see how such an approach might work. Great improvement forstep
! \$\endgroup\$orome– orome2015年09月26日 22:12:00 +00:00Commented Sep 26, 2015 at 22:12 -
\$\begingroup\$ Note also that roughly what yo propose for
encodeLetter
happens instageMappingList
with((stageMapping Fwd) <$>) <> ((stageMapping Rev) <$>).tail.reverse $ stages ec
. So there would be the place to make the correspondence to the physical operation clearer. \$\endgroup\$orome– orome2015年09月27日 11:45:35 +00:00Commented Sep 27, 2015 at 11:45 -
\$\begingroup\$ honestly that is a really difficult expression to grok! see the
encodeLetter
function in my gist in my other answer for another approach. \$\endgroup\$ErikR– ErikR2015年09月27日 21:28:54 +00:00Commented Sep 27, 2015 at 21:28 -
\$\begingroup\$ That's funny. I've been wrapping my head around Functors and Monoids and Applicatives and that expression (now) seems super clear to me, and maps directly the math. I bet in a week when I forget what I've learned it will make no sense to me at all (we'll see)! \$\endgroup\$orome– orome2015年09月28日 20:14:24 +00:00Commented Sep 28, 2015 at 20:14
I'm adding another answer because I've found more to say.
Use _
for field names
Prefix field names with an underscore. It makes the code easier to read because it's a signal to the reader that a field is being accessed instead of general function being called.
Other languages use .
for field accessors, and that makes
expressions like self.count += 1
more readable.
limited set of wheels
Your implementation is limited to the wheels which are listed
in comps
. To add a new component you have to change the
definition of comps
.
A better approach is to create a Wheel
data structure,
and add a function which creates an EnigmaConfig from a
collection of wheels:
data Wheel = Wheel ...
makeEnigma :: [Wheel] -> EnigmaConfig
compI = makeWheel "EKMFLGDQVZNTOWYHXUSPAIBRCJ" "Q"
compII = makeWheel "AJDKSIRUXBLHWTMCQGZNPYFVOE" "E"
...
compVIII = makeWheel ...
cfg = makeEnigma [ compVIII, compVI, compV ]
Now you can add more wheels without modifying existing code - you just add new definitions.
It's also more type safe. If wheels are just strings, the compiler can't catch misspelled component names, e.g. in:
cfg = EnigmaConfig [ ..., "compi", ... ]
the string "compi" will be interpreted as a plug-board wiring.
too much !!
For one thing !!
is inefficient on lists, but additionally
you are iterating over indices k
and writing expressions
like:
positions ec !! k
components ec !! k
rings ec !! k
Instead, iterate over the actual values themselves, e.g.:
map go $ zip (components ec) (positions ec)
where go :: (Component,Int) -> ...
go (comp,pos) = ...
Now it is clear that the go
function only depends on the Component and position.
think about testing
Think about what is important to test, and expose those functions at the top level so you can tet them individually.
You should be building up the functionality in layers so that you can throughly test one layer before using it to build the next layer.
My Approach
I've writen up my own implementation to this problem at:
https://gist.github.com/erantapaa/f071bc3f58d017f9280a
It's a Literate Haskell file so you can use it directly with ghc and ghci.
Note - it is currently untested.
-
\$\begingroup\$ Can you say more about what you mean by "field names"? In my view, Haskell has data and functions; the OO concepts of field and method aren't needed. Whether a function is implemented as a field or not is an implementation detail. This is true generally, and here is very much the case (since, e.g., I could have had, say
windows
be a record field and computed, eitherrings
orpositions
, given one or the other without changing anything else). \$\endgroup\$orome– orome2015年09月28日 20:06:51 +00:00Commented Sep 28, 2015 at 20:06 -
\$\begingroup\$ Good point about the
!!
. I need to fix that for sure. \$\endgroup\$orome– orome2015年09月28日 20:11:14 +00:00Commented Sep 28, 2015 at 20:11 -
\$\begingroup\$ Haskell has records, and the components of records are called fields - i.e. see (link). Almost every type in Haskell - except for the basic ones like Int, Char, etc. - are records. While it is true that the accessors are just functions, having the underscore in front of it helps the reader understand what's going on, and that's why you should use it. \$\endgroup\$ErikR– ErikR2015年09月28日 20:15:28 +00:00Commented Sep 28, 2015 at 20:15
-
\$\begingroup\$ Sometimes, but not in this case, imv. There's a lot of OO cruft that I feel like Haskell has the potential to wash away. Sometimes that isn't desirable; but what matters here is happening a level up. Again: I could decide to rework
EnigmaConfig
so thatwindows
was a field, allowing me to dispense with eitherrings
orpositions
as a field. There's no reason I should have to change a name when I do that. \$\endgroup\$orome– orome2015年09月28日 20:27:16 +00:00Commented Sep 28, 2015 at 20:27