Is there a better way to walk a directory tree?

Question 1

I just started to learn Haskell, and i wrote a function that walks the directory tree recursively and pass the content of each directory to a callback function:

The content of the directory is a tuple, containing a list of all the sub-directories, and a list of file names.
If there is an error, the error is passed to the callback function instead of the directory content (it use Either).

So here is the type of the callback function:

callback :: FileName -> Either IOError ([FileName], [FileName]) -> IO ()

And here is the type of my walk function:

walk :: FilePath -> (FilePath -> Either IOError ([FilePath], [FilePath]) -> IO ()) -> IO ()

And here is my complete code:

module Test where
import System.FilePath ((</>))
import Control.Monad (filterM, forM_, return)
import System.IO.Error (tryIOError, IOError)
import System.Directory (getDirectoryContents, doesFileExist, doesDirectoryExist)
myVisitor :: FilePath -> Either IOError ([FilePath], [FilePath]) -> IO ()
myVisitor path result = do
 case result of
 Left error -> do
 putStrLn $ "I've tryed to look in " ++ path ++ "."
 putStrLn $ "\tThere was an error: "
 putStrLn $ "\t\t" ++ (show error)
 Right (dirs, files) -> do
 putStrLn $ "I've looked in " ++ path ++ "."
 putStrLn $ "\tThere was " ++ (show $ length dirs) ++ " directorie(s) and " ++ (show $ length files) ++ " file(s):"
 forM_ (dirs ++ files) (\x -> putStrLn $ "\t\t- " ++ x)
 putStrLn ""
walk :: FilePath -> (FilePath -> Either IOError ([FilePath], [FilePath]) -> IO ()) -> IO ()
walk path visitor = do 
 result <- tryIOError listdir
 case result of
 Left error -> do
 visitor path result
 Right (dirs, files) -> do
 visitor path result
 forM_
 (map (\x -> path </> x) dirs) 
 (\x -> walk x visitor)
 where
 listdir = do
 entries <- (getDirectoryContents path) >>= filterHidden
 subdirs <- filterM isDir entries
 files <- filterM isFile entries
 return (subdirs, files)
 where 
 isFile entry = doesFileExist (path </> entry)
 isDir entry = doesDirectoryExist (path </> entry)
 filterHidden paths = do 
 return $ filter (\path -> head path /= '.') paths
main :: IO ()
main = do
 walk "/tmp" myVisitor
 walk "foo bar" myVisitor
 putStrLn "Done :-/"

As you can see, there is a lot of code and i import a lot of things... I'm sure there is a better way to do that, and i'm interested in any {idea,hint,tip} to improve this piece of code.

Thanks in advance :-)

Question 2

You might be interested in the concept of iteratees/pipes that can be used to solve this problem. It allows you to separate producing the tree and consuming it somewhere else without direct callback functions: You create a producer that enumerates the directory tree, perhaps some filters that modify the data and a separate consumer that works on the data. And then compose and run the whole pipeline. See also Streaming recursive descent of a directory in Haskell.

There are also specialized Haskell packages for that such as directory-tree, which you can use or study.

Edit: As an example I reworked your code using conduit. This library (and others based on the same principle) has several advantages, namely:

It separates the producer of the data with its consumer.
In a source you just call yield when you want to send a piece of data to the pipe.
In a sink you call await whenever you want to receive a piece of data (here we used more specialized awaitForever).
You can have conduits that sit in the middle and consume and produce values at the same time. They can do whatever processing on the stream, mixing calls to yield and await as they wish.
This allows you to create complex computation where the behavior of your components depends on data sent/deceived earlier. We use this in our source (traversing a directory tree), and I also added it to the sink (visitor) - it keeps track of how many directories it has been passed so far.
Both source and sink (and intermediate conduits, if any) can have finalizers.

Some suggestions:

Create your own data types instead of using combinations of Either and (,). It makes your code shorter and easier to understand.
Sometimes it's worth declaring new functions instead of using complex case ... of expressions. It can make code easier to read.
hlint can suggest how to (syntactically) improve a piece of code.

import System.FilePath ((</>))
import Control.Monad (filterM, forM_, return)
import System.IO.Error (tryIOError, IOError)
import System.Directory (getDirectoryContents, doesFileExist, doesDirectoryExist)
import Control.Monad.Trans.Class (lift)
import Data.Conduit
data DirContent = DirList [FilePath] [FilePath]
 | DirError IOError
data DirData = DirData FilePath DirContent
-- Produces directory data
walk :: FilePath -> Source IO DirData
walk path = do 
 result <- lift $ tryIOError listdir
 case result of
 Right dl@(DirList subdirs files)
 -> do
 yield (DirData path dl)
 forM_ subdirs (walk . (path </>))
 Left error
 -> yield (DirData path (DirError error))
 where
 listdir = do
 entries <- getDirectoryContents path >>= filterHidden
 subdirs <- filterM isDir entries
 files <- filterM isFile entries
 return $ DirList subdirs files
 where 
 isFile entry = doesFileExist (path </> entry)
 isDir entry = doesDirectoryExist (path </> entry)
 filterHidden paths = return $ filter (\path -> head path /= '.') paths
-- Consume directories
myVisitor :: Sink DirData IO ()
myVisitor = addCleanup (\_ -> putStrLn "Finished.") $ loop 1
 where
 loop n = do
 lift $ putStrLn $ ">> " ++ show n ++ ". directory visited:"
 r <- await
 case r of
 Nothing -> return ()
 Just r -> lift (process r) >> loop (n + 1)
 process (DirData path (DirError error)) = do
 putStrLn $ "I've tried to look in " ++ path ++ "."
 putStrLn $ "\tThere was an error: "
 putStrLn $ "\t\t" ++ show error
 process (DirData path (DirList dirs files)) = do
 putStrLn $ "I've looked in " ++ path ++ "."
 putStrLn $ "\tThere was " ++ show (length dirs) ++ " directorie(s) and " ++ show (length files) ++ " file(s):"
 forM_ (dirs ++ files) (putStrLn . ("\t\t- " ++))
main :: IO ()
main = do
 walk "/tmp" $$ myVisitor

Question 3

I don't want to add too many dependencies to my software, so i won't use directory-tree ; However, conduit seems great and can probably be used in sereral parts of my software, so i think i will use it. Anyway, your code is more cleaner than mine, especially the 2nd function. Thanks for your response :-)

Petr Petr 3,07018 silver badges33 bronze badges · Accepted Answer · 2013-02-27 20:10:49Z

You might be interested in the concept of iteratees/pipes that can be used to solve this problem. It allows you to separate producing the tree and consuming it somewhere else without direct callback functions: You create a producer that enumerates the directory tree, perhaps some filters that modify the data and a separate consumer that works on the data. And then compose and run the whole pipeline. See also Streaming recursive descent of a directory in Haskell.

There are also specialized Haskell packages for that such as directory-tree, which you can use or study.

Edit: As an example I reworked your code using conduit. This library (and others based on the same principle) has several advantages, namely:

It separates the producer of the data with its consumer.
In a source you just call yield when you want to send a piece of data to the pipe.
In a sink you call await whenever you want to receive a piece of data (here we used more specialized awaitForever).
You can have conduits that sit in the middle and consume and produce values at the same time. They can do whatever processing on the stream, mixing calls to yield and await as they wish.
This allows you to create complex computation where the behavior of your components depends on data sent/deceived earlier. We use this in our source (traversing a directory tree), and I also added it to the sink (visitor) - it keeps track of how many directories it has been passed so far.
Both source and sink (and intermediate conduits, if any) can have finalizers.

Some suggestions:

Create your own data types instead of using combinations of Either and (,). It makes your code shorter and easier to understand.
Sometimes it's worth declaring new functions instead of using complex case ... of expressions. It can make code easier to read.
hlint can suggest how to (syntactically) improve a piece of code.

import System.FilePath ((</>))
import Control.Monad (filterM, forM_, return)
import System.IO.Error (tryIOError, IOError)
import System.Directory (getDirectoryContents, doesFileExist, doesDirectoryExist)
import Control.Monad.Trans.Class (lift)
import Data.Conduit
data DirContent = DirList [FilePath] [FilePath]
 | DirError IOError
data DirData = DirData FilePath DirContent
-- Produces directory data
walk :: FilePath -> Source IO DirData
walk path = do 
 result <- lift $ tryIOError listdir
 case result of
 Right dl@(DirList subdirs files)
 -> do
 yield (DirData path dl)
 forM_ subdirs (walk . (path </>))
 Left error
 -> yield (DirData path (DirError error))
 where
 listdir = do
 entries <- getDirectoryContents path >>= filterHidden
 subdirs <- filterM isDir entries
 files <- filterM isFile entries
 return $ DirList subdirs files
 where 
 isFile entry = doesFileExist (path </> entry)
 isDir entry = doesDirectoryExist (path </> entry)
 filterHidden paths = return $ filter (\path -> head path /= '.') paths
-- Consume directories
myVisitor :: Sink DirData IO ()
myVisitor = addCleanup (\_ -> putStrLn "Finished.") $ loop 1
 where
 loop n = do
 lift $ putStrLn $ ">> " ++ show n ++ ". directory visited:"
 r <- await
 case r of
 Nothing -> return ()
 Just r -> lift (process r) >> loop (n + 1)
 process (DirData path (DirError error)) = do
 putStrLn $ "I've tried to look in " ++ path ++ "."
 putStrLn $ "\tThere was an error: "
 putStrLn $ "\t\t" ++ show error
 process (DirData path (DirList dirs files)) = do
 putStrLn $ "I've looked in " ++ path ++ "."
 putStrLn $ "\tThere was " ++ show (length dirs) ++ " directorie(s) and " ++ show (length files) ++ " file(s):"
 forM_ (dirs ++ files) (putStrLn . ("\t\t- " ++))
main :: IO ()
main = do
 walk "/tmp" $$ myVisitor

I don't want to add too many dependencies to my software, so i won't use directory-tree ; However, conduit seems great and can probably be used in sereral parts of my software, so i think i will use it. Anyway, your code is more cleaner than mine, especially the 2nd function. Thanks for your response :-)

Stack Exchange Network

Is there a better way to walk a directory tree?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Is there a better way to walk a directory tree?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions