7
\$\begingroup\$

I'm very new to Haskell, I've read most of learnyouahaskell.com and played around with some simple things, but this is probably the most 'complicated' bit of Haskell I've written so far. I have an implementation in PHP that does basically the same thing, but it is much much faster.. I'm guessing my bottleneck here is that randomRIO takes a long time to return a value, is there any way to increase the speed, or get an array of values instead of grabbing them one at a time?

That aside, any other tips and/or suggestions about improving my implementation would be very helpful!

module Main where
import System.Environment
import System.Exit
import System.Random (randomRIO)
import Control.Monad (replicateM)
import qualified Data.Map as M
import Text.Regex.Posix
main :: IO ()
main = do
 args <- getArgs
 let helpOnly = "-h" `elem` args
 if helpOnly
 then putStrLn usage
 else do
 password <- generatePassword ("-w" `elem` args)
 putStrLn password
 exitSuccess
usage :: String
usage = unlines [
 "",
 "Usage: [ -w, --with-symbols ] [ -h, --help ]",
 "",
 "By default, generates a 16 character password that does not include symbols.",
 "",
 "-w, --with-symbols include symbols",
 "-h, --help print a brief help message"
 ]
symbols :: String
symbols = "!$%^&*()-_=+[{]};:@#~,<.>/?"
randomReplace :: String -> String -> IO String
randomReplace [] subject = return subject
randomReplace (replacement:rs) subject = do
 randomIndex <- randomRIO (0, length subject - 1) :: IO Int
 let hash = zip [0 .. length subject - 1] subject
 (randomReplace rs . map snd . M.toList . M.insert randomIndex replacement . M.fromList) hash
generatePassword :: Bool -> IO String
generatePassword withSymbols = do
 let passwordLength = 500 -- obviously you wouldn't use 500 as a default here, but I'm just benchmarking
 numDigits <- randomRIO (1, passwordLength) :: IO Int
 digits <- replicateM numDigits (randomRIO (1, 9) :: IO Int)
 numUppercase <- randomRIO (1, passwordLength) :: IO Int
 uppercaseLetters <- replicateM numUppercase (randomRIO ('A', 'Z') :: IO Char)
 p1 <- replicateM passwordLength (randomRIO ('a', 'z') :: IO Char)
 p2 <- randomReplace (concatMap show digits) p1
 p3 <- randomReplace uppercaseLetters p2
 password <- if withSymbols
 then do
 numSymbols <- randomRIO (1, passwordLength) :: IO Int
 symbolsToReplace <- replicateM numSymbols ((randomRIO (0, length symbols - 1) :: IO Int) >>= (\x -> return $ symbols !! x))
 randomReplace symbolsToReplace p3
 else return p3
 if (password =~ "[a-z]" :: Bool) && (password =~ "[A-Z]" :: Bool) && (password =~ "[0-9]" :: Bool) -- we knows symbols are in there since it went last
 then return password
 else generatePassword withSymbols
200_success
145k22 gold badges190 silver badges478 bronze badges
asked Jan 17, 2016 at 3:24
\$\endgroup\$
2
  • \$\begingroup\$ Totally unrelated to your question (which seems good to me!) learnyouahaskell.com seems super cool. \$\endgroup\$ Commented Jan 17, 2016 at 3:54
  • \$\begingroup\$ @QPaysTaxes good! it was definitely a great resource \$\endgroup\$ Commented Jan 17, 2016 at 3:59

2 Answers 2

4
\$\begingroup\$
  • Too much IO.
  • Text.Regex.Posix is too much for those simple tests. Why not any isUpper, any isLower, any isDigit?
  • Your program ignores --help and --with-symbols (have a look at OptParse-Applicative)
  • randomReplace has too many temporary structures (see below)
  • You use length too often. Unlike in PHP, it's quite slow (O(n) vs O(1)).

A short search for bottlenecks

I'm guessing my bottleneck here ...

Don't guess, measure. Enable profiling to check where you're actually losing time:

$ stack install --profile random
$ stack install --profile regex-posix
$ stack exec -- ghc -O2 -prof -auto-all PWGen.hs
$ ./PWGen+RTS -s -p

If you don't use stack, make sure to enable profiling when you install the packages:

$ cabal sandbox init
$ cabal install -p random regex-posix
$ cabal exec -- ghc -O2 -prof -auto-all PWGen.hs
$ ./PWGen +RTS -s -p

This is actually fast, but not fast enough:

9,420,841,024 bytes allocated in the heap
 6,425,011,808 bytes copied during GC
 1,021,088 bytes maximum residency (2915 sample(s))
 59,672 bytes maximum slop
 4 MB total memory in use (0 MB lost due to fragmentation)
 Tot time (elapsed) Avg pause Max pause
 Gen 0 15199 colls, 0 par 1.844s 1.971s 0.0001s 0.0011s
 Gen 1 2915 colls, 0 par 1.188s 1.076s 0.0004s 0.0009s
 INIT time 0.000s ( 0.001s elapsed)
 MUT time 1.953s ( 2.001s elapsed)
 GC time 3.031s ( 3.048s elapsed)
 RP time 0.000s ( 0.000s elapsed)
 PROF time 0.000s ( 0.000s elapsed)
 EXIT time 0.000s ( 0.000s elapsed)
 Total time 5.000s ( 5.050s elapsed)
 %GC time 60.6% (60.4% elapsed)
 Alloc rate 4,823,470,604 bytes per MUT second
 Productivity 39.4% of total user, 39.0% of total elapsed

Note: I've changed passwordLength to 5000, since it was too quick with 500. However, the times above don't give enough information where you actually lose that time. That's what -p was for. PWGen.prof contains the following data:

COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 59 0 0.0 0.0 100.0 100.0
 main Main 119 0 0.8 0.0 99.9 100.0
 generatePassword Main 121 1 0.4 0.1 99.1 100.0
 randomReplace Main 124 5730 69.5 62.9 98.7 99.9
 randomReplace.hash Main 125 5728 29.2 36.9 29.2 36.9
 main.helpOnly Main 120 1 0.0 0.0 0.0 0.0

Almost all memory is allocated in randomReplace. After all, you break both the map and the list apart for every character in replacement. That's quite costly. Instead, let's try to stay on a single map as long as possible:

-- This is still not an idiomatic version, but better
randomReplace :: String -> String -> IO String
randomReplace rs subject = fmap (map snd . M.toList) $ go rs $ M.fromList $ zip [0..] subject
 where
 l = length subject
 go [] m = return m
 go (r:rs) m = do
 randomIndex <- randomRIO (0, l - 1) :: IO Int
 go rs $ M.insert randomIndex r m

What's the big difference? Well, we're not using length repeatedly, which is a big plus. length needs to traverse the whole list to get its result. Also, we're not switching between lists and maps all the time, which gets rid of all intermediate lists.

Running the benchmark again, we gain the following result:

 INIT time 0.000s ( 0.001s elapsed)
 MUT time 0.016s ( 0.021s elapsed)
 GC time 0.016s ( 0.011s elapsed)
 RP time 0.000s ( 0.000s elapsed)
 PROF time 0.000s ( 0.000s elapsed)
 EXIT time 0.000s ( 0.000s elapsed)
 Total time 0.031s ( 0.033s elapsed)
 %GC time 50.0% (32.6% elapsed)
 Alloc rate 2,105,396,736 bytes per MUT second
 Productivity 50.0% of total user, 47.7% of total elapsed
 individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 59 0 5.6 0.0 100.0 100.0
 main Main 119 0 5.6 0.1 94.4 97.8
 generatePassword Main 121 1 55.6 44.7 88.9 97.7
 randomReplace Main 124 2 0.0 16.4 33.3 53.0
 randomReplace.l Main 126 2 0.0 0.0 0.0 0.0
 randomReplace.go Main 125 5521 33.3 36.6 33.3 36.6
 main.helpOnly Main 120 1 0.0 0.0 0.0 0.0

Runtime is down to 0.03s from 5s. Keep in mind that this is for passwordLength = 5000. This is 0.6% of the original runtime. I can even crank passwordLength up to 100000 and it still takes only 0.7s total.

The reason here is (somewhat) simple. randomReplace had a bad asymptoticl complexity. If n is the length of subject and k is the length of rs, you get roughly:

k -- for every character in rs
 * ( n -- get the length of subject (every time!)
 + n * log(n) -- create the map
 + log (n) -- insert a character at a random position
 + 2 * n -- zip the list and zip it back
 )

Compare this to the new version:

k * (log (n)) -- for every character insert a character into a map
+ n -- get the length once(!)
+ n * log (n) -- create a map once(!)
+ 2 * n -- zip the map and zip it back

So this would be a way to improve your runtime tremendously.

Further places for improvement

generatePassword is too opaque, it's not clear what you're doing. Try to split it into several sections. For example, all those lists of symbols can be abstracted into

randomListOf :: Random g => Int -> (g, g) -> IO [g]
randomListOf n r = do
 l <- randomRIO (1, n)
 replicateM l $ randomRIO r
-- ...
generatePassword = do
 -- ...
 digits <- randomListOf passwordLength ('0','9')
 uppercase <- randomListOf passwordLength ('A','Z')

However, you should try to make all your functions pure. That way, you can easily test them later.

Also, you can try to generate passwords that contain at least one digit. It's not possible with random by default, but you can probably achieve something similar to a modified Test.QuickCheck.frequency.

answered Jan 17, 2016 at 16:46
\$\endgroup\$
1
  • \$\begingroup\$ wow, incredible review -- thanks so much for taking the time! I didn't know about the profiling options available, that's incredibly useful -- I'll also definitely be using optparse-applicative from now on, work on simpler abstractions, and making my Haskell code more idiomatic by example. \$\endgroup\$ Commented Jan 17, 2016 at 17:19
0
\$\begingroup\$

I'm guessing my bottleneck here is that randomRIO takes a long time to return a value, is there any way to increase the speed, or get an array of values instead of grabbing them one at a time?

Yes there is a way, and not only it is way faster, it is also way simpler:

randomsUpTo :: R.RandomGen g => g -> Int -> [Int]
randomsUpTo seed max = map (`mod` max) $ R.randoms seed

This just gives an infinite list of random numbers up to a max given an initial seed.

Generating a password is now trivial:

randomPassword seed len charset = map (charset !!) randomIndexes
 where
 randomIndexes = take len (randomsUpTo seed (length charset))

Benchmarking with:

allAscii = map C.chr [32..127]
main = do
 seed <- R.newStdGen
 let x = randomPassword seed 100000 allAscii
 print $ x 

Takes \0ドル.27\$ seconds to run.

Running your code for 10000 (that is 10 times less what I ran my code with) took \95ドル\$ seconds.

answered Jan 17, 2016 at 12:04
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.