Return to Answer

replaced http://stackoverflow.com/ with https://stackoverflow.com/

edited May 23, 2017 at 12:40

You might be interested in the concept of iteratees/pipes that can be used to solve this problem. It allows you to separate producing the tree and consuming it somewhere else without direct callback functions: You create a producer that enumerates the directory tree, perhaps some filters that modify the data and a separate consumer that works on the data. And then compose and run the whole pipeline. See also Streaming recursive descent of a directory in Haskell Streaming recursive descent of a directory in Haskell.

added 3780 characters in body

Source Link

edited Feb 28, 2013 at 20:29

Petr

edited Feb 28, 2013 at 20:29

Petr

3.1k
18
33

Edit: As an example I reworked your code using conduit . This library (and others based on the same principle) has several advantages, namely:

It separates the producer of the data with its consumer.
In a source you just call yield when you want to send a piece of data to the pipe.
In a sink you call await whenever you want to receive a piece of data (here we used more specialized awaitForever).
You can have conduits that sit in the middle and consume and produce values at the same time. They can do whatever processing on the stream, mixing calls to yield and await as they wish.
This allows you to create complex computation where the behavior of your components depends on data sent/deceived earlier. We use this in our source (traversing a directory tree), and I also added it to the sink (visitor) - it keeps track of how many directories it has been passed so far.
Both source and sink (and intermediate conduits, if any) can have finalizers.

Some suggestions:

Create your own data types instead of using combinations of Either and (,). It makes your code shorter and easier to understand.
Sometimes it's worth declaring new functions instead of using complex case ... of expressions. It can make code easier to read.
hlint can suggest how to (syntactically) improve a piece of code.

import System.FilePath ((</>))
import Control.Monad (filterM, forM_, return)
import System.IO.Error (tryIOError, IOError)
import System.Directory (getDirectoryContents, doesFileExist, doesDirectoryExist)
import Control.Monad.Trans.Class (lift)
import Data.Conduit
data DirContent = DirList [FilePath] [FilePath]
 | DirError IOError
data DirData = DirData FilePath DirContent
-- Produces directory data
walk :: FilePath -> Source IO DirData
walk path = do 
 result <- lift $ tryIOError listdir
 case result of
 Right dl@(DirList subdirs files)
 -> do
 yield (DirData path dl)
 forM_ subdirs (walk . (path </>))
 Left error
 -> yield (DirData path (DirError error))
 where
 listdir = do
 entries <- getDirectoryContents path >>= filterHidden
 subdirs <- filterM isDir entries
 files <- filterM isFile entries
 return $ DirList subdirs files
 where 
 isFile entry = doesFileExist (path </> entry)
 isDir entry = doesDirectoryExist (path </> entry)
 filterHidden paths = return $ filter (\path -> head path /= '.') paths
-- Consume directories
myVisitor :: Sink DirData IO ()
myVisitor = addCleanup (\_ -> putStrLn "Finished.") $ loop 1
 where
 loop n = do
 lift $ putStrLn $ ">> " ++ show n ++ ". directory visited:"
 r <- await
 case r of
 Nothing -> return ()
 Just r -> lift (process r) >> loop (n + 1)
 process (DirData path (DirError error)) = do
 putStrLn $ "I've tried to look in " ++ path ++ "."
 putStrLn $ "\tThere was an error: "
 putStrLn $ "\t\t" ++ show error
 process (DirData path (DirList dirs files)) = do
 putStrLn $ "I've looked in " ++ path ++ "."
 putStrLn $ "\tThere was " ++ show (length dirs) ++ " directorie(s) and " ++ show (length files) ++ " file(s):"
 forM_ (dirs ++ files) (putStrLn . ("\t\t- " ++))
main :: IO ()
main = do
 walk "/tmp" $$ myVisitor

Edit: As an example I reworked your code using conduit . This library (and others based on the same principle) has several advantages, namely:

It separates the producer of the data with its consumer.
In a source you just call yield when you want to send a piece of data to the pipe.
In a sink you call await whenever you want to receive a piece of data (here we used more specialized awaitForever).
You can have conduits that sit in the middle and consume and produce values at the same time. They can do whatever processing on the stream, mixing calls to yield and await as they wish.
This allows you to create complex computation where the behavior of your components depends on data sent/deceived earlier. We use this in our source (traversing a directory tree), and I also added it to the sink (visitor) - it keeps track of how many directories it has been passed so far.
Both source and sink (and intermediate conduits, if any) can have finalizers.

Some suggestions:

Create your own data types instead of using combinations of Either and (,). It makes your code shorter and easier to understand.
Sometimes it's worth declaring new functions instead of using complex case ... of expressions. It can make code easier to read.
hlint can suggest how to (syntactically) improve a piece of code.

import System.FilePath ((</>))
import Control.Monad (filterM, forM_, return)
import System.IO.Error (tryIOError, IOError)
import System.Directory (getDirectoryContents, doesFileExist, doesDirectoryExist)
import Control.Monad.Trans.Class (lift)
import Data.Conduit
data DirContent = DirList [FilePath] [FilePath]
 | DirError IOError
data DirData = DirData FilePath DirContent
-- Produces directory data
walk :: FilePath -> Source IO DirData
walk path = do 
 result <- lift $ tryIOError listdir
 case result of
 Right dl@(DirList subdirs files)
 -> do
 yield (DirData path dl)
 forM_ subdirs (walk . (path </>))
 Left error
 -> yield (DirData path (DirError error))
 where
 listdir = do
 entries <- getDirectoryContents path >>= filterHidden
 subdirs <- filterM isDir entries
 files <- filterM isFile entries
 return $ DirList subdirs files
 where 
 isFile entry = doesFileExist (path </> entry)
 isDir entry = doesDirectoryExist (path </> entry)
 filterHidden paths = return $ filter (\path -> head path /= '.') paths
-- Consume directories
myVisitor :: Sink DirData IO ()
myVisitor = addCleanup (\_ -> putStrLn "Finished.") $ loop 1
 where
 loop n = do
 lift $ putStrLn $ ">> " ++ show n ++ ". directory visited:"
 r <- await
 case r of
 Nothing -> return ()
 Just r -> lift (process r) >> loop (n + 1)
 process (DirData path (DirError error)) = do
 putStrLn $ "I've tried to look in " ++ path ++ "."
 putStrLn $ "\tThere was an error: "
 putStrLn $ "\t\t" ++ show error
 process (DirData path (DirList dirs files)) = do
 putStrLn $ "I've looked in " ++ path ++ "."
 putStrLn $ "\tThere was " ++ show (length dirs) ++ " directorie(s) and " ++ show (length files) ++ " file(s):"
 forM_ (dirs ++ files) (putStrLn . ("\t\t- " ++))
main :: IO ()
main = do
 walk "/tmp" $$ myVisitor

Source Link

answered Feb 27, 2013 at 20:10

Petr

answered Feb 27, 2013 at 20:10

Petr

3.1k
18
33

There are also specialized Haskell packages for that such as directory-tree, which you can use or study.

lang-hs