I just started to learn Haskell, and i wrote a function that walks the directory tree recursively and pass the content of each directory to a callback function:
- The content of the directory is a tuple, containing a list of all the sub-directories, and a list of file names.
- If there is an error, the error is passed to the callback function instead of the directory content (it use
Either
).
So here is the type of the callback function:
callback :: FileName -> Either IOError ([FileName], [FileName]) -> IO ()
And here is the type of my walk
function:
walk :: FilePath -> (FilePath -> Either IOError ([FilePath], [FilePath]) -> IO ()) -> IO ()
And here is my complete code:
module Test where
import System.FilePath ((</>))
import Control.Monad (filterM, forM_, return)
import System.IO.Error (tryIOError, IOError)
import System.Directory (getDirectoryContents, doesFileExist, doesDirectoryExist)
myVisitor :: FilePath -> Either IOError ([FilePath], [FilePath]) -> IO ()
myVisitor path result = do
case result of
Left error -> do
putStrLn $ "I've tryed to look in " ++ path ++ "."
putStrLn $ "\tThere was an error: "
putStrLn $ "\t\t" ++ (show error)
Right (dirs, files) -> do
putStrLn $ "I've looked in " ++ path ++ "."
putStrLn $ "\tThere was " ++ (show $ length dirs) ++ " directorie(s) and " ++ (show $ length files) ++ " file(s):"
forM_ (dirs ++ files) (\x -> putStrLn $ "\t\t- " ++ x)
putStrLn ""
walk :: FilePath -> (FilePath -> Either IOError ([FilePath], [FilePath]) -> IO ()) -> IO ()
walk path visitor = do
result <- tryIOError listdir
case result of
Left error -> do
visitor path result
Right (dirs, files) -> do
visitor path result
forM_
(map (\x -> path </> x) dirs)
(\x -> walk x visitor)
where
listdir = do
entries <- (getDirectoryContents path) >>= filterHidden
subdirs <- filterM isDir entries
files <- filterM isFile entries
return (subdirs, files)
where
isFile entry = doesFileExist (path </> entry)
isDir entry = doesDirectoryExist (path </> entry)
filterHidden paths = do
return $ filter (\path -> head path /= '.') paths
main :: IO ()
main = do
walk "/tmp" myVisitor
walk "foo bar" myVisitor
putStrLn "Done :-/"
As you can see, there is a lot of code and i import
a lot of things...
I'm sure there is a better way to do that, and i'm interested in any
{idea,hint,tip}
to improve this piece of code.
Thanks in advance :-)
1 Answer 1
You might be interested in the concept of iteratees/pipes that can be used to solve this problem. It allows you to separate producing the tree and consuming it somewhere else without direct callback functions: You create a producer that enumerates the directory tree, perhaps some filters that modify the data and a separate consumer that works on the data. And then compose and run the whole pipeline. See also Streaming recursive descent of a directory in Haskell.
There are also specialized Haskell packages for that such as directory-tree, which you can use or study.
Edit: As an example I reworked your code using conduit. This library (and others based on the same principle) has several advantages, namely:
- It separates the producer of the data with its consumer.
- In a source you just call
yield
when you want to send a piece of data to the pipe. - In a sink you call
await
whenever you want to receive a piece of data (here we used more specializedawaitForever
). - You can have conduits that sit in the middle and consume and produce values at the same time. They can do whatever processing on the stream, mixing calls to
yield
andawait
as they wish. - This allows you to create complex computation where the behavior of your components depends on data sent/deceived earlier. We use this in our source (traversing a directory tree), and I also added it to the sink (visitor) - it keeps track of how many directories it has been passed so far.
- Both source and sink (and intermediate conduits, if any) can have finalizers.
Some suggestions:
- Create your own data types instead of using combinations of
Either
and(,)
. It makes your code shorter and easier to understand. - Sometimes it's worth declaring new functions instead of using complex
case ... of
expressions. It can make code easier to read. - hlint can suggest how to (syntactically) improve a piece of code.
import System.FilePath ((</>))
import Control.Monad (filterM, forM_, return)
import System.IO.Error (tryIOError, IOError)
import System.Directory (getDirectoryContents, doesFileExist, doesDirectoryExist)
import Control.Monad.Trans.Class (lift)
import Data.Conduit
data DirContent = DirList [FilePath] [FilePath]
| DirError IOError
data DirData = DirData FilePath DirContent
-- Produces directory data
walk :: FilePath -> Source IO DirData
walk path = do
result <- lift $ tryIOError listdir
case result of
Right dl@(DirList subdirs files)
-> do
yield (DirData path dl)
forM_ subdirs (walk . (path </>))
Left error
-> yield (DirData path (DirError error))
where
listdir = do
entries <- getDirectoryContents path >>= filterHidden
subdirs <- filterM isDir entries
files <- filterM isFile entries
return $ DirList subdirs files
where
isFile entry = doesFileExist (path </> entry)
isDir entry = doesDirectoryExist (path </> entry)
filterHidden paths = return $ filter (\path -> head path /= '.') paths
-- Consume directories
myVisitor :: Sink DirData IO ()
myVisitor = addCleanup (\_ -> putStrLn "Finished.") $ loop 1
where
loop n = do
lift $ putStrLn $ ">> " ++ show n ++ ". directory visited:"
r <- await
case r of
Nothing -> return ()
Just r -> lift (process r) >> loop (n + 1)
process (DirData path (DirError error)) = do
putStrLn $ "I've tried to look in " ++ path ++ "."
putStrLn $ "\tThere was an error: "
putStrLn $ "\t\t" ++ show error
process (DirData path (DirList dirs files)) = do
putStrLn $ "I've looked in " ++ path ++ "."
putStrLn $ "\tThere was " ++ show (length dirs) ++ " directorie(s) and " ++ show (length files) ++ " file(s):"
forM_ (dirs ++ files) (putStrLn . ("\t\t- " ++))
main :: IO ()
main = do
walk "/tmp" $$ myVisitor
-
\$\begingroup\$ I don't want to add too many dependencies to my software, so i won't use
directory-tree
; However,conduit
seems great and can probably be used in sereral parts of my software, so i think i will use it. Anyway, your code is more cleaner than mine, especially the 2nd function. Thanks for your response :-) \$\endgroup\$m-r-r– m-r-r2013年03月16日 19:06:00 +00:00Commented Mar 16, 2013 at 19:06