This program implements a sorted grep, that is, a specialized version of grep
for sorted files. It uses binary search for the lines of a file that begin with a certain string.
You can copy and paste the code in a file and run it as:
$> runhaskell sgrep.hs "string to find" sorted_file
I'm looking for suggestions about style, efficiency, and correctness.
module Main where
import Data.List (isPrefixOf)
import Data.Maybe (isNothing, fromJust)
import System.Environment (getArgs)
import System.IO
-- Chunk of a file
data Chunk = Chunk Handle Integer Integer
-- Is char newline?
isNL :: Char -> Bool
isNL c = c == '\n'
-- Are we at the beginning of file?
isBOF :: Handle -> IO Bool
isBOF = (fmap (== 0)) . hTell
-- Go to beginning of line
goToBOL :: Handle -> IO ()
goToBOL h = do
bof <- isBOF h
if bof
then return ()
else do
eof <- hIsEOF h
if eof
then do
hSeek h RelativeSeek (-2)
goToBOL h
else do
c <- hGetChar h
if isNL c
then return ()
else do
hSeek h RelativeSeek (-2)
goToBOL h
getCurrentLine :: Handle -> IO String
getCurrentLine h = goToBOL h >> hGetLine h
getPrevLine :: Handle -> IO (Maybe String)
getPrevLine h = do
goToBOL h
bof <- isBOF h
if bof
then return Nothing
else do
hSeek h RelativeSeek (-2)
goToBOL h
bof <- isBOF h
if bof
then return Nothing
else do
hSeek h RelativeSeek (-2)
goToBOL h
line <- hGetLine h
return $ Just line
goTo :: Handle -> Integer -> IO ()
goTo h i = do
hSeek h AbsoluteSeek i
search :: Chunk -> String -> IO (Maybe String)
search (Chunk h start end) str
| start >= end = return Nothing
| otherwise = do
if mid == (end - 1)
then return Nothing
else do
goTo h mid
midLine <- getCurrentLine h
prevLine <- getPrevLine h
-- putStrLn $ "*** " ++ show start ++ " " ++ show end ++ " " ++ show mid ++ " " ++ midLine ++ ", " ++ show prevLine
if str `isPrefixOf` midLine && ((isNothing prevLine) || not (str `isPrefixOf` (fromJust prevLine)))
then return $ Just midLine
else if str < midLine
then search (Chunk h start mid) str
else search (Chunk h mid end) str
where mid = (start + end) `div` 2
sgrep :: Handle -> String -> IO ()
sgrep h s = do
len <- hFileSize h
match <- search (Chunk h 0 len) s
-- putStrLn $ show match
c <- hGetContents h
putStrLn . unlines $ takeWhile (isPrefixOf s) (lines c)
main :: IO ()
main = do
args <- getArgs
let s = head args
putStrLn s
let fname = head $ tail args
withFile fname ReadMode (\h -> sgrep h s)
-
\$\begingroup\$ @Dan I've flagged it for migration. The only thing left now is waiting for someone to see the flag. \$\endgroup\$FUZxxl– FUZxxl2011年11月26日 19:57:29 +00:00Commented Nov 26, 2011 at 19:57
-
1\$\begingroup\$ You should run this through HLint. It gives many good suggestions for improvement. \$\endgroup\$hammar– hammar2011年11月26日 20:25:15 +00:00Commented Nov 26, 2011 at 20:25
-
\$\begingroup\$ Thank all for your suggestions. You can find the result of my efforts on github. \$\endgroup\$lbolla– lbolla2011年11月27日 16:12:38 +00:00Commented Nov 27, 2011 at 16:12
-
\$\begingroup\$ For who's interested, I've implemented a more efficient versionn using bytestrings. Code and benchmarks are available here. \$\endgroup\$lbolla– lbolla2011年12月01日 07:31:32 +00:00Commented Dec 1, 2011 at 7:31
2 Answers 2
Use your monads! Your code exhibits the walking-right antipattern. You can avoid it with when
and guard
. Consider goToBOL
. This is how I would write it:
-- Go to beginning of line
goToBOL :: Handle -> IO ()
goToBOL h = do
bof <- isBOF h
when (not bof) $ do
eof <- hIsEOF h
if eof then do hSeek h RelativeSeek (-2)
goToBOL h
else do c <- hGetChar h
when (not $ isNL c) $ do
hSeek h RelativeSeek (-2)
goToBOL h
In your other functions, namely getPrevLine
and search
, you'd better use MaybeT IO x
instead of IO (Maybe x)
as you can use the monadic combinators better when you do so.
In addition to @FUZxxl's points:
You call sgrep
only from the last line of main
, and the parameters are the wrong way round. Change to
sgrep :: String -> Handle -> IO ()
sgrep s h = do
...
and
...
withFile fname ReadMode (sgrep s)
And I'd pattern match the command line arguments (assuming you don't need the benefits of System.Console.GetOpt):
main :: IO ()
main = do
(s : fname : _) <- getArgs
putStrLn s
withFile fname ReadMode (sgrep s)