I am trying to solve a Hacker Rank problem about string suffixes:
For two strings A and B, we define the similarity of the strings to be the length of the longest prefix common to both strings. For example, the similarity of strings "abc" and "abd" is 2, while the similarity of strings "aaa" and "aaab" is 3.
Calculate the sum of similarities of a string S with each of its suffixes.
I have the following code:
import System.IO
import Control.Monad
import Data.Maybe
import Data.List
import qualified Data.ByteString.Char8 as BSC
main = do
n <- liftM getIntFromBS BSC.getLine
replicateM_ n $ do
s <- BSC.getLine
putStrLn . show $ sum $ map (cntPrefix s) $ tail $ sort $ BSC.tails s
where
getIntFromBS = fst.fromJust.BSC.readInt
cntPrefix str pref = length $ takeWhile (\t -> fst t == snd t) $ BSC.zip pref str
But performance is really bad. I am doing this challenges to learn Haskell and thus my skills aren't good enough to optimize this program. Any help is appreciated.
-
\$\begingroup\$ What you may and may not do after receiving answers \$\endgroup\$200_success– 200_success2014年11月13日 17:25:41 +00:00Commented Nov 13, 2014 at 17:25
-
\$\begingroup\$ Normally, I would recommend the Z Algorithm, but I'm not sure how to translate it into Haskell. \$\endgroup\$200_success– 200_success2014年11月13日 17:40:40 +00:00Commented Nov 13, 2014 at 17:40
1 Answer 1
As far as I can tell your solution is not only slow but incorrect because of your usage of sort
. Dropping that single function from your pipeline nets an exponential improvement, and it stopped including the similarity of s
to itself in the answer.
Make sure also you're compiling with -O2
, otherwise sum
won't be optimized into a strict fold and you'll blow up the stack with thunks.
I also came up with a version targeted toward readability, but it's ~3x slower than your (corrected) ByteString
based version so I'll just include it here as a curiosity.
import Control.Monad (replicateM_)
import Data.List (tails)
main :: IO ()
main = do
n <- readLn
replicateM_ n $ do
s <- getLine
print $ sum $ map (similarity s) (suffixes s)
similarity :: (Eq a) => [a] -> [a] -> Int
similarity s = length . takeWhile id . zipWith (==) s
suffixes :: [a] -> [[a]]
suffixes = tail . tails
-
\$\begingroup\$ The sort isn't there for fun, is necessary to give the correct answer. Your answer gives wrong output. \$\endgroup\$OneEyeQuestion– OneEyeQuestion2014年11月13日 14:21:46 +00:00Commented Nov 13, 2014 at 14:21
-
1\$\begingroup\$ Only if you consider
xs :: [a]
to be a suffix ofxs :: [a]
, which depends on how pedantic you're willing to be. It also is completely unnecessary, because the only thingtail . sort . tails
will ever do is drop an instance of the empty list which in your case would have a similarity of 0 anyway and not affect the sum. Removetail $ sort $
from your version and you'll see the answer you expect along with the speedup. \$\endgroup\$bisserlis– bisserlis2014年11月13日 14:50:38 +00:00Commented Nov 13, 2014 at 14:50 -
\$\begingroup\$ That's different indeed to what you posted since with your answer tail will drop an arbitrary suffix and not the empty list, that as you said is redundant to drop since it equals zero. Unfortunately tried with the suggested modifications but still too slow. I'll update the original post. \$\endgroup\$OneEyeQuestion– OneEyeQuestion2014年11月13日 17:15:09 +00:00Commented Nov 13, 2014 at 17:15
Explore related questions
See similar questions with these tags.