I am new in haskell and for start I choosed to write simple grep. Now I want to ask if there is some simplier/shorter way to write it. For example if there is any way to avoid recursion.
parseLines :: String -> [String] -> Int -> IO ()
parseLines _ [] _ = return ()
parseLines pattern (x:xs) line = do
when (isInfixOf pattern x) $ putStrLn $ (show line) ++ ": " ++ x
parseLines pattern xs (line+1)
processFile :: String -> String -> IO ()
processFile _ [] = return ()
processFile pattern file = do
exists <- doesFileExist file
if not exists
then putStrLn $ file ++ ": file does not exists"
else do
putStrLn file
content <- readFile file
parseLines pattern (lines content) 0
processFiles :: String -> [String] -> IO ()
processFiles _ [] = return ()
processFiles pattern (x:xs) = do
processFile pattern x
processFiles pattern xs
main = do
args <- getArgs
processFiles (head args) (tail args)
-
\$\begingroup\$ Since you're actually not doing any regexps, I believe this can hardly be called a "grep". This is just some searching utility. \$\endgroup\$Nikita Volkov– Nikita Volkov2013年09月25日 17:07:53 +00:00Commented Sep 25, 2013 at 17:07
-
\$\begingroup\$ regexps are next level \$\endgroup\$FuF– FuF2013年09月25日 18:00:40 +00:00Commented Sep 25, 2013 at 18:00
2 Answers 2
I see these areas where your code could be improved:
processFiles
can be expressed very simply usingmapM_
fromControl.Monad
:processFiles :: String -> [String] -> IO () processFiles pattern = mapM_ (processFile pattern)
- All your functions are in the
IO
monad. This goes a bit against Haskell's philosophy to keep side effects to minimum. parseLines
requires the whole file to be read into the memory. This could be solved by using lazy IO, but I'd strongly discourage you from doing so.
One possibility to solve 2. and 3. is to use conduits. This may seem as somewhat complex subject, but the idea is actually very intuitive. A conduit is something that reads input a produces output, using some particular monad. This allows to break your program into very small, reusable components, each doing a single particular task. This makes it easier to debug, test and maintain.
For example, your code could be refactored as follows. (First some required imports.)
import Control.Monad
import Control.Monad.IO.Class
import Data.ByteString (unpack)
import Data.Conduit
import qualified Data.Conduit.Binary as C
import qualified Data.Conduit.List as C
import Data.List (isInfixOf)
import System.Environment (getArgs)
import System.Directory (doesFileExist)
import System.IO
sourceFileLines :: (MonadResource m) => FilePath -> Source m String
sourceFileLines file = bracketP (openFile file ReadMode) hClose loop
where
loop h = do
eof <- liftIO (hIsEOF h)
unless eof (liftIO (hGetLine h) >>= yield >> loop h)
This function takes a file name and creates a Source
- a conduit that takes no input, but produces output. It reads a file line by line and sends each line down the pipeline using yield
. Using bracketP
we ensure that the file will get closed no matter what happens to the pipeline.
numLines :: (Monad m) => Conduit a m (Int, a)
numLines = C.scanl step 1
where
step x n = (n + 1, (n, x))
This component built using scanl
is very simple. It just sends its input to the output, and keeps the count along the way. Notice that this conduit doesn't need any IO
, it works with any monad.
Now it's easy to filter a stream of numbered lines with a pattern:
parseLines :: (Monad m) => String -> Conduit String m (Int, String)
parseLines pattern = numLines =$= C.filter f
where
f (_, x) = isInfixOf pattern x
This function fuses two conduits together. The first one numbers lines, the second filters them according to the pattern.
printMatch :: (MonadIO m) => Sink (Int, String) m ()
printMatch = C.mapM_ (\(n, x) -> liftIO $ putStrLn $ show n ++ ": " ++ x)
In printMatch
we separate the logic that prints out the output. For each pair it receives it prints the line number and its content.
Combining and running these conduits is then easy:
runResourceT $ sourceFileLines file $= parseLines pattern $$ printMatch
(runResourceT
is needed because of bracketP
.) So the rest of the program would look like
processFile :: String -> String -> IO ()
processFile _ [] = return ()
processFile pattern file = do
exists <- doesFileExist file
if not exists
then putStrLn $ file ++ ": file does not exists"
else do
putStrLn file
runResourceT $ sourceFileLines file $= parseLines pattern $$ printMatch
processFiles :: String -> [String] -> IO ()
processFiles pattern = mapM_ (processFile pattern)
main = do
args <- getArgs
processFiles (head args) (tail args)
Here's the updated code. The notes are following after.
import Control.Monad
import Data.List
import System.Directory
import System.Environment
search :: String -> String -> [(Int, String)]
search searchString content = do
(lineNumber, lineText) <- zip [0..] $ lines content
if isInfixOf searchString lineText
then return (lineNumber, lineText)
else mzero
processFile :: String -> String -> IO ()
processFile searchString file = do
exists <- doesFileExist file
if not exists
then error $ file ++ ": file does not exist"
else do
putStrLn file
content <- readFile file
forM_ (search searchString content) $ \(lineNumber, lineText) -> do
putStrLn $ show lineNumber ++ ": " ++ lineText
main :: IO ()
main = do
args <- getArgs
case args of
searchString : fileNames | not $ null fileNames -> do
forM_ fileNames $ processFile searchString
_ -> error "Not enough arguments"
Notes:
In Haskell it's idiomatic to isolate pure code from IO-interactions as much as possible. So first we isolate the
search
function by accumulating most of the non-IO logic in it. In its implementation I'm utilizing aMonad
andMonadPlus
instances for list, so don't be surprised by thedo
-notation used in a non-IO
context. Alternatively a List Comprehension syntax could be used, but I'm just not a fan of it. This could also be solved usingmap
andfilter
and whatnot.forM_
helps us loop in monads without recursion.