You might be interested in the concept of iteratees/pipes that can be used to solve this problem. It allows you to separate producing the tree and consuming it somewhere else without direct callback functions: You create a producer that enumerates the directory tree, perhaps some filters that modify the data and a separate consumer that works on the data. And then compose and run the whole pipeline. See also Streaming recursive descent of a directory in Haskell Streaming recursive descent of a directory in Haskell.
You might be interested in the concept of iteratees/pipes that can be used to solve this problem. It allows you to separate producing the tree and consuming it somewhere else without direct callback functions: You create a producer that enumerates the directory tree, perhaps some filters that modify the data and a separate consumer that works on the data. And then compose and run the whole pipeline. See also Streaming recursive descent of a directory in Haskell.
You might be interested in the concept of iteratees/pipes that can be used to solve this problem. It allows you to separate producing the tree and consuming it somewhere else without direct callback functions: You create a producer that enumerates the directory tree, perhaps some filters that modify the data and a separate consumer that works on the data. And then compose and run the whole pipeline. See also Streaming recursive descent of a directory in Haskell.
Edit: As an example I reworked your code using conduit . This library (and others based on the same principle) has several advantages, namely:
- It separates the producer of the data with its consumer.
- In a source you just call
yield
when you want to send a piece of data to the pipe. - In a sink you call
await
whenever you want to receive a piece of data (here we used more specializedawaitForever
). - You can have conduits that sit in the middle and consume and produce values at the same time. They can do whatever processing on the stream, mixing calls to
yield
andawait
as they wish. - This allows you to create complex computation where the behavior of your components depends on data sent/deceived earlier. We use this in our source (traversing a directory tree), and I also added it to the sink (visitor) - it keeps track of how many directories it has been passed so far.
- Both source and sink (and intermediate conduits, if any) can have finalizers.
Some suggestions:
- Create your own data types instead of using combinations of
Either
and(,)
. It makes your code shorter and easier to understand. - Sometimes it's worth declaring new functions instead of using complex
case ... of
expressions. It can make code easier to read. - hlint can suggest how to (syntactically) improve a piece of code.
import System.FilePath ((</>))
import Control.Monad (filterM, forM_, return)
import System.IO.Error (tryIOError, IOError)
import System.Directory (getDirectoryContents, doesFileExist, doesDirectoryExist)
import Control.Monad.Trans.Class (lift)
import Data.Conduit
data DirContent = DirList [FilePath] [FilePath]
| DirError IOError
data DirData = DirData FilePath DirContent
-- Produces directory data
walk :: FilePath -> Source IO DirData
walk path = do
result <- lift $ tryIOError listdir
case result of
Right dl@(DirList subdirs files)
-> do
yield (DirData path dl)
forM_ subdirs (walk . (path </>))
Left error
-> yield (DirData path (DirError error))
where
listdir = do
entries <- getDirectoryContents path >>= filterHidden
subdirs <- filterM isDir entries
files <- filterM isFile entries
return $ DirList subdirs files
where
isFile entry = doesFileExist (path </> entry)
isDir entry = doesDirectoryExist (path </> entry)
filterHidden paths = return $ filter (\path -> head path /= '.') paths
-- Consume directories
myVisitor :: Sink DirData IO ()
myVisitor = addCleanup (\_ -> putStrLn "Finished.") $ loop 1
where
loop n = do
lift $ putStrLn $ ">> " ++ show n ++ ". directory visited:"
r <- await
case r of
Nothing -> return ()
Just r -> lift (process r) >> loop (n + 1)
process (DirData path (DirError error)) = do
putStrLn $ "I've tried to look in " ++ path ++ "."
putStrLn $ "\tThere was an error: "
putStrLn $ "\t\t" ++ show error
process (DirData path (DirList dirs files)) = do
putStrLn $ "I've looked in " ++ path ++ "."
putStrLn $ "\tThere was " ++ show (length dirs) ++ " directorie(s) and " ++ show (length files) ++ " file(s):"
forM_ (dirs ++ files) (putStrLn . ("\t\t- " ++))
main :: IO ()
main = do
walk "/tmp" $$ myVisitor
Edit: As an example I reworked your code using conduit . This library (and others based on the same principle) has several advantages, namely:
- It separates the producer of the data with its consumer.
- In a source you just call
yield
when you want to send a piece of data to the pipe. - In a sink you call
await
whenever you want to receive a piece of data (here we used more specializedawaitForever
). - You can have conduits that sit in the middle and consume and produce values at the same time. They can do whatever processing on the stream, mixing calls to
yield
andawait
as they wish. - This allows you to create complex computation where the behavior of your components depends on data sent/deceived earlier. We use this in our source (traversing a directory tree), and I also added it to the sink (visitor) - it keeps track of how many directories it has been passed so far.
- Both source and sink (and intermediate conduits, if any) can have finalizers.
Some suggestions:
- Create your own data types instead of using combinations of
Either
and(,)
. It makes your code shorter and easier to understand. - Sometimes it's worth declaring new functions instead of using complex
case ... of
expressions. It can make code easier to read. - hlint can suggest how to (syntactically) improve a piece of code.
import System.FilePath ((</>))
import Control.Monad (filterM, forM_, return)
import System.IO.Error (tryIOError, IOError)
import System.Directory (getDirectoryContents, doesFileExist, doesDirectoryExist)
import Control.Monad.Trans.Class (lift)
import Data.Conduit
data DirContent = DirList [FilePath] [FilePath]
| DirError IOError
data DirData = DirData FilePath DirContent
-- Produces directory data
walk :: FilePath -> Source IO DirData
walk path = do
result <- lift $ tryIOError listdir
case result of
Right dl@(DirList subdirs files)
-> do
yield (DirData path dl)
forM_ subdirs (walk . (path </>))
Left error
-> yield (DirData path (DirError error))
where
listdir = do
entries <- getDirectoryContents path >>= filterHidden
subdirs <- filterM isDir entries
files <- filterM isFile entries
return $ DirList subdirs files
where
isFile entry = doesFileExist (path </> entry)
isDir entry = doesDirectoryExist (path </> entry)
filterHidden paths = return $ filter (\path -> head path /= '.') paths
-- Consume directories
myVisitor :: Sink DirData IO ()
myVisitor = addCleanup (\_ -> putStrLn "Finished.") $ loop 1
where
loop n = do
lift $ putStrLn $ ">> " ++ show n ++ ". directory visited:"
r <- await
case r of
Nothing -> return ()
Just r -> lift (process r) >> loop (n + 1)
process (DirData path (DirError error)) = do
putStrLn $ "I've tried to look in " ++ path ++ "."
putStrLn $ "\tThere was an error: "
putStrLn $ "\t\t" ++ show error
process (DirData path (DirList dirs files)) = do
putStrLn $ "I've looked in " ++ path ++ "."
putStrLn $ "\tThere was " ++ show (length dirs) ++ " directorie(s) and " ++ show (length files) ++ " file(s):"
forM_ (dirs ++ files) (putStrLn . ("\t\t- " ++))
main :: IO ()
main = do
walk "/tmp" $$ myVisitor
You might be interested in the concept of iteratees/pipes that can be used to solve this problem. It allows you to separate producing the tree and consuming it somewhere else without direct callback functions: You create a producer that enumerates the directory tree, perhaps some filters that modify the data and a separate consumer that works on the data. And then compose and run the whole pipeline. See also Streaming recursive descent of a directory in Haskell.
There are also specialized Haskell packages for that such as directory-tree, which you can use or study.