I'm trying to broaden my horizon by learning functional programming (coming from OO, specficially C#). For this, I'm implementing some small problems to get a feeling for the language. One of my first is the Caesar cipher.
Formal problem statement:
Implement a Caesar cipher, both encoding and decoding. The key is an integer from 1 to 25. This cipher rotates the letters of the alphabet (A to Z). The encoding replaces each letter with the 1st to 25th next letter in the alphabet (wrapping Z to A). So key 2 encrypts "HI" to "JK", but key 20 encrypts "HI" to "BC". This simple "monoalphabetic substitution cipher" provides almost no security, because an attacker who has the encoded message can either use frequency analysis to guess the key, or just try all 25 keys.
One change I did implement was that the shift is not constrained to 1 - 25, but can be any (positive) number. Note also that I'm okay with converting the input/output to upper case.
What I'm most interested in is whether what I wrote is idiomatic F# or if there are better (more functional) ways of doing things.
(*
Caesar cipher -
Implement a Caesar cipher, both encoding and decoding.
The key is an integer from 1 to 25. This cipher rotates the letters of the alphabet (A to Z).
The encoding replaces each letter with the 1st to 25th next letter in the alphabet (wrapping Z to A).
So key 2 encrypts "HI" to "JK", but key 20 encrypts "HI" to "BC".
This simple "monoalphabetic substitution cipher" provides almost no security, because an attacker who has the encoded message
can either use frequency analysis to guess the key, or just try all 25 keys.
*)
open System
let alphabet = [| 'A' .. 'Z' |]
let tryGetIndex c =
let idx = alphabet
|> Array.tryFindIndex (fun t -> t = (Char.ToUpper c))
(c, idx)
// own modulo operation because f# % doesn't work right for us if
// we're handling negative numbers
// f#: -3 % 26 -> -3
// 'ours': modulo -3 26 -> 23
// taken from here:
// http://gettingsharper.de/2012/02/28/how-to-implement-a-mathematically-correct-modulus-operator-in-f/
let modulo n m = ((n % m) + m) % m
let encryptIdx idx shift =
(idx + shift) % alphabet.Length
let decryptIdx idx shift =
let newIdx = idx - shift
modulo newIdx alphabet.Length
let getShifted shift shiftOperation charIdx =
match charIdx with
| _, Some idx -> alphabet.[shiftOperation idx shift]
| c, None -> c
let operateCaesar operation shift text =
if shift < 0 then
raise (ArgumentException "only positive shifts are supported")
(Seq.map tryGetIndex
>> Seq.map (getShifted shift operation)
>> Seq.map string) text
|> String.concat ""
let encryptcaesar shift text = operateCaesar encryptIdx shift text
let decryptcaesar shift text = operateCaesar decryptIdx shift text
let enc = "hello world" |> encryptcaesar 500
let dec = enc |> decryptcaesar 500
2 Answers 2
Disclaimer: I know almost no F#. But I know Haskell. So I think I should be able to give some insight. An F# expert can give you more, of course.
Re-use of functions
One thing that strikes me as odd is that you have two shift functions, encryptIdx
and decryptIdx
. Well, the number of shift functions isn't what I'm concerned with, to be honest, but the different requirements on the shift
value.
Where decryptIdx
can take any shift
(even a negative one), encryptIdx
will return a value out of [| 0 .. 25 |]
if you supply a negative one due to the property of %
you've displayed nicely in the comment. That's not immediately obvious for someone, so you should mention in the documentation/comments that encryptIdx
may only use a non-negative shift.
However, the bigger problem is that both implementations can get out of sync. Let's say that at some point you decide to use another alphabet, e.g.
let alphabet2 = [| '0' .. 'z' |]
You change your encryptIdx
, but you forget to change it in decryptIdx
. Ouch. So instead, let us write one in terms of the other, or rather in terms of another function:
let shiftIdx shift idx =
let newIdx = idx + shift
modulo newIdx alphabet.Length
let encryptIdx shift idx =
shiftIdx shift idx
let decryptIdx shift idx =
shiftIdx (-shift) idx
Note that I've changed the order of arguments. Why? Because it enables us to use currying:
let encryptIdx shift =
shiftIdx shift
let decryptIdx shift =
shiftIdx (-shift)
Combining functions
Either way, let us now look at getShifted
. I'm kidding. getShifted
is perfectly fine with your current combination of tryGetIndex
returning a pair.[remark 1]
But operateCaesar
could use a little polish. First of all, we can get rid of the shift < 0
exception, since now both shifts can handle negative numbers. Next, instead of repeatedly mapping over our text, we can map once:
let operateCaesar operation shift text =
text
|> Seq.map (tryGetIndex >> getShifted shift operation >> string)
|> String.concat ""
That's because Seq.map f >> Seq.map g
is the same as Seq.map (f >> g)
, so we have only one Seq.map
call.[remark 2]
Before we leave that function, let us think about types. operation
will be an int -> int
. shift
will be an int
. And text
will be a seq<char>
. The result will be a String
. However, if we restrict text
to be a String
, we can use String.map
instead of Seq.map
and make the function easier again:
let operateCaesar operation shift text =
text
|> String.map (tryGetIndex >> getShifted shift operation)
Now for a short test. Remember my remark about currying? That
let f bar baz = g magic bar baz
is the same as
let f = g magic
With that in mind, how can you rewrite operateCaesar
above or the next two lines?
let encryptcaesar shift text = operateCaesar encryptIdx shift text
let decryptcaesar shift text = operateCaesar decryptIdx shift text
The solution will be in the code below, so make sure to think about it before you scroll further.
All in all, we end up with:
open System
let alphabet = [| 'A' .. 'Z' |]
let tryGetIndex c =
let idx = alphabet
|> Array.tryFindIndex ((=) (Char.ToUpper c ))
(c, idx)
let modulo n m = ((n % m) + m) % m
let shiftIdx shift idx =
let newIdx = idx + shift
modulo newIdx alphabet.Length
let encryptIdx shift = shiftIdx shift
let decryptIdx shift = shiftIdx (-shift)
let getShifted shiftOperation shift charIdx =
match charIdx with
| _, Some idx -> alphabet.[shiftOperation shift idx]
| c, None -> c
let operateCaesar operation shift = String.map (tryGetIndex >> getShifted operation shift)
let encryptcaesar = operateCaesar encryptIdx
let decryptcaesar = operateCaesar decryptIdx
I took the courtesy to change the order of arguments in getShifted
so that it reflects the order of arguments in operateCaesar
better.
Remarks on your original code in a nutshell
So what were my major critique points? Function re-use. encryptIdx
and decryptIdx
are well suited to be implemented in terms of each other.
Another approach
Let us change tryGetIndex
slightly and make it true to its name:
let tryGetIndex c =
alphabet
|> Array.tryFindIndex ((=) (Char.ToUpper c))
It now returns an Option<int>
, so just the index. Now we need another function, fromIndex
:
let fromIndex idx = alphabet.[idx]
Now we can write a combined variant of tryGetIndex >> getShifted
:
let characterShift shiftFunc c =
tryGetIndex c |> Option.fold (fun _ v -> fromIndex (shiftFunc v)) c
The Option.fold
uses the given function if the Option
is Some
and returns the second argument otherwise. So we either had an index, shift it and return the corresponding character, or we didn't, and therefore return the original character. Note that shiftFunc
here is a function that shifts the character accordingly, e.g. encryptIdx 10
. That's all in the article linked above, though.
We end up with the following operateCaesar
:
let operateCaesar op shift =
String.map (characterShift (op shift))
Remark 1: We will revisit that soon enough.
Remark 2: I sincerely hope that F#'s compiler does this optimization by default.
-
\$\begingroup\$ Thank you very much for the thorough analysis! Especially currying is something that I have to make more use of (coming from C# I'm used to supplying all arguments all the time ). Regarding the different
encryptIdx
anddecryptIdx
: You're totally correct, I should have changedencryptIdx
as well; don't really know why I didn't. I'll have to take a closer look at yourcharacterShift
to fully comprehend it. \$\endgroup\$germi– germi2017年04月30日 11:46:25 +00:00Commented Apr 30, 2017 at 11:46 -
1\$\begingroup\$ @germi Keep in mind that I'm not a F#-person, so the review should be taken with a grain of salt. Now to the
Option.fold
.Option.fold
takes three arguments: a combining function, a "state", and anOption
. If theOption
isNone
, the state (the original characterc
) is returned. If theOption
isSome
the combining function gets both the state and theOption
's value. The state isn't interesting for us, so we ignore it (the_
) and just shift the value and return the correct character from the now shifted index. \$\endgroup\$Zeta– Zeta2017年04月30日 12:02:12 +00:00Commented Apr 30, 2017 at 12:02 -
1\$\begingroup\$ @germi also, just in case you come across that term: the process of going from
let foo bar = goo bar
tolet foo = goo
is called eta reduction or eta conversion. \$\endgroup\$Zeta– Zeta2017年04月30日 12:03:47 +00:00Commented Apr 30, 2017 at 12:03
Zeta gives a thorough answer; I'll just add the following.
The operateCaesar function could be written as:
let operateCaesar operation shift text =
if shift < 0 then
raise (ArgumentException "only positive shifts are supported")
snd (text |> Seq.mapFold (fun str ch -> ch, (sprintf "%s%c" str (getShifted shift operation (tryGetIndex ch)))) "")
Which may not be clearer but just another way to do the same thing, and in one operation.
Shifting 500 positions is essentially the same as shifting 500 % alphabet.Length = 6 (signed)
, so you could consider any shift to be done in two consecutive alphabets (A - Z A - Z):
let alphabet = [ 'A'..'Z'] @ [ 'A'..'Z']
The position of a char can be found as:
let pos = ((int)ch - (int)alphabet.[0])
While encrypting the new position is pos + shift
.
While decrypting the new position is alphatbet.Length + pos - shift
All in all my solution would be something like this:
let alphabet = [ 'A'..'Z']
let shiftText shift text =
let alphabetLength = alphabet |> Seq.length
let actualShift = shift % alphabetLength
let prePos = if shift >= 0 then 0 else alphabetLength
let offset ch =
let pos = ((int)ch - (int)alphabet.[0])
match pos with
| x when ch >= alphabet.[0] && ch <= alphabet.[alphabetLength-1] -> alphabet.[(prePos + pos + actualShift) % alphabetLength]
| _ -> ch
snd (text |> Seq.mapFold (fun str ch -> ch, (sprintf "%s%c" str (offset ch))) "")
let encrypt shift text = shiftText shift text
let decrypt shift text = shiftText -shift text
-
\$\begingroup\$ Why
Seq.mapFold
instead ofString.map
? Why notSeq.fold
? After all, you don't use the result of themap
part at all. Also, I'm not sure whether the inlineisValid
(akach >= ... && ch <= ...
is good practice. But then again, I'm not a F# developer. \$\endgroup\$Zeta– Zeta2017年05月03日 07:12:17 +00:00Commented May 3, 2017 at 7:12 -
\$\begingroup\$ Also, given that
Seq.mapFold
's documentation is currently missing from the official (online) documents, I'd add some remarks there. Other than that, I think all your points are valid. \$\endgroup\$Zeta– Zeta2017年05月03日 07:12:58 +00:00Commented May 3, 2017 at 7:12 -
\$\begingroup\$ @Zeta, Of cause you should use fold instead of mapFold. That'll make the line much more readable. \$\endgroup\$user73941– user739412017年05月03日 16:02:14 +00:00Commented May 3, 2017 at 16:02