Return to Question

added 7 characters in body; edited title

edited Jun 17, 2014 at 5:44

35.2k
13
134
238

Counting occurencesoccurrences of `Char8`sChar8s in a file

toTo learn some Data.Map and Control.Monad.State, I have written the following code, which should count the occurencesoccurrences of Char8 in a given file.

editEdit

Just to mention: I

I want to keep the list of chars and their count of occurenceoccurrence in the order of occurenceoccurrence of the corresponding char, also I want that on equal count for most often and most seldom used chars that one "wins""wins" that has occuredoccurred first.

Also I am aware of the fact that I have a file that states to be utfUTF-8 encoded and my program can only "see""see" ASCII, but I just took a random book from gutenberg.org to have a real text and not some "random""random" bytes in huge binarys or something comparable.

Counting occurences of `Char8`s in a file

to learn some Data.Map and Control.Monad.State I have written the following code, which should count the occurences of Char8 in a given file.

edit

Just to mention: I want to keep the list of chars and their count of occurence in the order of occurence of the corresponding char, also I want that on equal count for most often and most seldom used chars that one "wins" that has occured first.

Also I am aware of the fact that I have a file that states to be utf-8 encoded and my program can only "see" ASCII, but I just took a random book from gutenberg.org to have a real text and not some "random" bytes in huge binarys or something comparable.

Counting occurrences of Char8s in a file

To learn some Data.Map and Control.Monad.State, I have written the following code, which should count the occurrences of Char8 in a given file.

Edit

Just to mention:

I want to keep the list of chars and their count of occurrence in the order of occurrence of the corresponding char, also I want that on equal count for most often and most seldom used chars that one "wins" that has occurred first.

Also I am aware of the fact that I have a file that states to be UTF-8 encoded and my program can only "see" ASCII, but I just took a random book from gutenberg.org to have a real text and not some "random" bytes in huge binarys or something comparable.

added 262 characters in body

Source Link

edited Jun 16, 2014 at 20:36

NobbZ

edited Jun 16, 2014 at 20:36

NobbZ

edit

Source Link

asked Jun 16, 2014 at 19:00

NobbZ

asked Jun 16, 2014 at 19:00

NobbZ

Counting occurences of `Char8`s in a file

to learn some Data.Map and Control.Monad.State I have written the following code, which should count the occurences of Char8 in a given file.

module Main where
import Control.Monad.State
import qualified Data.ByteString.Char8 as B
import Data.Char
import Data.Map (Map)
import qualified Data.Map as M
import Data.Maybe
import System.Environment
type Counter = ([Char], (Char, Integer), (Char, Integer), Map Char Integer)
main :: IO ()
main = do
 file <- liftM head getArgs
 string <- B.readFile file
 let (chars, seldom, often, counter) = execState (countChars string) ([], (' ', 10^12), (' ', 0), M.empty)
 mapM_ (\k -> putStrLn ('\'':k:"' " ++ show (fromJust $ M.lookup k counter))) chars
 putStrLn $ "minimum occurence: " ++ show (fst seldom) ++ ", " ++ show (snd seldom)
 putStrLn $ "maximum occurence: " ++ show (fst often) ++ ", " ++ show (snd often)
 putStrLn $ "overal chars : " ++ show (B.length string)
countChars :: B.ByteString -> State Counter ()
countChars bs | B.null bs = return ()
 | otherwise = do
 let c = B.head bs
 let cs = B.tail bs
 unless (not . isPrint $ c) $ do
 (pre, minOcc, maxOcc, counter) <- get
 newPre <- return $ if (c `elem` pre)
 then pre
 else pre ++ [c]
 newCounter <- return (M.insertWith (+) c 1 counter)
 let newMinOcc = (\ (k1, v1) (k2, v2) -> if v2 < v1 then (k2, v2) else (k1, v1)) minOcc (c, newCounter M.! c)
 let newMaxOcc = (\ (k1, v1) (k2, v2) -> if v2 > v1 then (k2, v2) else (k1, v1)) maxOcc (c, newCounter M.! c)
 put (newPre, newMinOcc, newMaxOcc, newCounter)
 countChars cs

This code works as I want it for small files, counting and displaying the stats of the file, but it fails when the file gets bigger:

$ ls -l 46000.txt.utf-8
-rw-r--r-- 1 nobbz nobbz 156315 Jun 16 20:33 46000.txt.utf-8
$ ./Main 46000.txt.utf-8
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.

I don't grasp why a file of about 150kiB can overflow an 8Mib stack. If I increase to 10 MiB (./Main 46000.txt.utf-8 +RTS -k10M) it works, but using also -sstderr shows that the Program consumes 107 MiB in total.

How could I decrease memory usage, so it would work with large files and default stack-size?

lang-hs