6
\$\begingroup\$

This program implements a sorted grep, that is, a specialized version of grep for sorted files. It uses binary search for the lines of a file that begin with a certain string.

You can copy and paste the code in a file and run it as:

$> runhaskell sgrep.hs "string to find" sorted_file

I'm looking for suggestions about style, efficiency, and correctness.

module Main where
import Data.List (isPrefixOf)
import Data.Maybe (isNothing, fromJust)
import System.Environment (getArgs)
import System.IO
-- Chunk of a file
data Chunk = Chunk Handle Integer Integer
-- Is char newline?
isNL :: Char -> Bool
isNL c = c == '\n'
-- Are we at the beginning of file?
isBOF :: Handle -> IO Bool
isBOF = (fmap (== 0)) . hTell
-- Go to beginning of line
goToBOL :: Handle -> IO ()
goToBOL h = do
 bof <- isBOF h
 if bof 
 then return ()
 else do 
 eof <- hIsEOF h
 if eof
 then do 
 hSeek h RelativeSeek (-2)
 goToBOL h
 else do 
 c <- hGetChar h
 if isNL c
 then return ()
 else do
 hSeek h RelativeSeek (-2)
 goToBOL h
getCurrentLine :: Handle -> IO String
getCurrentLine h = goToBOL h >> hGetLine h
getPrevLine :: Handle -> IO (Maybe String)
getPrevLine h = do
 goToBOL h
 bof <- isBOF h
 if bof
 then return Nothing
 else do
 hSeek h RelativeSeek (-2)
 goToBOL h
 bof <- isBOF h
 if bof
 then return Nothing
 else do
 hSeek h RelativeSeek (-2)
 goToBOL h
 line <- hGetLine h
 return $ Just line
goTo :: Handle -> Integer -> IO ()
goTo h i = do
 hSeek h AbsoluteSeek i
search :: Chunk -> String -> IO (Maybe String)
search (Chunk h start end) str
 | start >= end = return Nothing
 | otherwise = do
 if mid == (end - 1)
 then return Nothing
 else do
 goTo h mid
 midLine <- getCurrentLine h
 prevLine <- getPrevLine h
 -- putStrLn $ "*** " ++ show start ++ " " ++ show end ++ " " ++ show mid ++ " " ++ midLine ++ ", " ++ show prevLine
 if str `isPrefixOf` midLine && ((isNothing prevLine) || not (str `isPrefixOf` (fromJust prevLine)))
 then return $ Just midLine
 else if str < midLine
 then search (Chunk h start mid) str
 else search (Chunk h mid end) str
 where mid = (start + end) `div` 2
sgrep :: Handle -> String -> IO ()
sgrep h s = do
 len <- hFileSize h
 match <- search (Chunk h 0 len) s
 -- putStrLn $ show match
 c <- hGetContents h
 putStrLn . unlines $ takeWhile (isPrefixOf s) (lines c)
main :: IO ()
main = do
 args <- getArgs
 let s = head args
 putStrLn s
 let fname = head $ tail args
 withFile fname ReadMode (\h -> sgrep h s)
200_success
145k22 gold badges190 silver badges478 bronze badges
asked Nov 26, 2011 at 18:24
\$\endgroup\$
4
  • \$\begingroup\$ @Dan I've flagged it for migration. The only thing left now is waiting for someone to see the flag. \$\endgroup\$ Commented Nov 26, 2011 at 19:57
  • 1
    \$\begingroup\$ You should run this through HLint. It gives many good suggestions for improvement. \$\endgroup\$ Commented Nov 26, 2011 at 20:25
  • \$\begingroup\$ Thank all for your suggestions. You can find the result of my efforts on github. \$\endgroup\$ Commented Nov 27, 2011 at 16:12
  • \$\begingroup\$ For who's interested, I've implemented a more efficient versionn using bytestrings. Code and benchmarks are available here. \$\endgroup\$ Commented Dec 1, 2011 at 7:31

2 Answers 2

6
\$\begingroup\$

Use your monads! Your code exhibits the walking-right antipattern. You can avoid it with when and guard. Consider goToBOL. This is how I would write it:

-- Go to beginning of line
goToBOL :: Handle -> IO ()
goToBOL h = do
 bof <- isBOF h
 when (not bof) $ do 
 eof <- hIsEOF h
 if eof then do hSeek h RelativeSeek (-2)
 goToBOL h
 else do c <- hGetChar h
 when (not $ isNL c) $ do
 hSeek h RelativeSeek (-2)
 goToBOL h

In your other functions, namely getPrevLine and search, you'd better use MaybeT IO x instead of IO (Maybe x) as you can use the monadic combinators better when you do so.

answered Nov 26, 2011 at 21:57
\$\endgroup\$
2
\$\begingroup\$

In addition to @FUZxxl's points:

You call sgrep only from the last line of main, and the parameters are the wrong way round. Change to

sgrep :: String -> Handle -> IO ()
sgrep s h = do
 ...

and

 ...
 withFile fname ReadMode (sgrep s)

And I'd pattern match the command line arguments (assuming you don't need the benefits of System.Console.GetOpt):

main :: IO ()
main = do
 (s : fname : _) <- getArgs
 putStrLn s
 withFile fname ReadMode (sgrep s)
answered Nov 27, 2011 at 14:27
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.