Counting occurencesoccurrences of `Char8`sChar8s in a file
toTo learn some Data.Map
and Control.Monad.State
, I have written the following code, which should count the occurencesoccurrences of Char8
in a given file.
editEdit
Just to mention: I
I want to keep the list of chars and their count of occurenceoccurrence in the order of occurenceoccurrence of the corresponding char, also I want that on equal count for most often and most seldom used chars that one "wins""wins" that has occuredoccurred first.
Also I am aware of the fact that I have a file that states to be utfUTF-8 encoded and my program can only "see""see" ASCII, but I just took a random book from gutenberg.org to have a real text and not some "random""random" bytes in huge binarys or something comparable.
Counting occurences of `Char8`s in a file
to learn some Data.Map
and Control.Monad.State
I have written the following code, which should count the occurences of Char8
in a given file.
edit
Just to mention: I want to keep the list of chars and their count of occurence in the order of occurence of the corresponding char, also I want that on equal count for most often and most seldom used chars that one "wins" that has occured first.
Also I am aware of the fact that I have a file that states to be utf-8 encoded and my program can only "see" ASCII, but I just took a random book from gutenberg.org to have a real text and not some "random" bytes in huge binarys or something comparable.
Counting occurrences of Char8s in a file
To learn some Data.Map
and Control.Monad.State
, I have written the following code, which should count the occurrences of Char8
in a given file.
Edit
Just to mention:
I want to keep the list of chars and their count of occurrence in the order of occurrence of the corresponding char, also I want that on equal count for most often and most seldom used chars that one "wins" that has occurred first.
Also I am aware of the fact that I have a file that states to be UTF-8 encoded and my program can only "see" ASCII, but I just took a random book from gutenberg.org to have a real text and not some "random" bytes in huge binarys or something comparable.
edit
Just to mention: I want to keep the list of chars and their count of occurence in the order of occurence of the corresponding char, also I want that on equal count for most often and most seldom used chars that one "wins" that has occured first.
Also I am aware of the fact that I have a file that states to be utf-8 encoded and my program can only "see" ASCII, but I just took a random book from gutenberg.org to have a real text and not some "random" bytes in huge binarys or something comparable.
edit
Just to mention: I want to keep the list of chars and their count of occurence in the order of occurence of the corresponding char, also I want that on equal count for most often and most seldom used chars that one "wins" that has occured first.
Also I am aware of the fact that I have a file that states to be utf-8 encoded and my program can only "see" ASCII, but I just took a random book from gutenberg.org to have a real text and not some "random" bytes in huge binarys or something comparable.
Counting occurences of `Char8`s in a file
to learn some Data.Map
and Control.Monad.State
I have written the following code, which should count the occurences of Char8
in a given file.
module Main where
import Control.Monad.State
import qualified Data.ByteString.Char8 as B
import Data.Char
import Data.Map (Map)
import qualified Data.Map as M
import Data.Maybe
import System.Environment
type Counter = ([Char], (Char, Integer), (Char, Integer), Map Char Integer)
main :: IO ()
main = do
file <- liftM head getArgs
string <- B.readFile file
let (chars, seldom, often, counter) = execState (countChars string) ([], (' ', 10^12), (' ', 0), M.empty)
mapM_ (\k -> putStrLn ('\'':k:"' " ++ show (fromJust $ M.lookup k counter))) chars
putStrLn $ "minimum occurence: " ++ show (fst seldom) ++ ", " ++ show (snd seldom)
putStrLn $ "maximum occurence: " ++ show (fst often) ++ ", " ++ show (snd often)
putStrLn $ "overal chars : " ++ show (B.length string)
countChars :: B.ByteString -> State Counter ()
countChars bs | B.null bs = return ()
| otherwise = do
let c = B.head bs
let cs = B.tail bs
unless (not . isPrint $ c) $ do
(pre, minOcc, maxOcc, counter) <- get
newPre <- return $ if (c `elem` pre)
then pre
else pre ++ [c]
newCounter <- return (M.insertWith (+) c 1 counter)
let newMinOcc = (\ (k1, v1) (k2, v2) -> if v2 < v1 then (k2, v2) else (k1, v1)) minOcc (c, newCounter M.! c)
let newMaxOcc = (\ (k1, v1) (k2, v2) -> if v2 > v1 then (k2, v2) else (k1, v1)) maxOcc (c, newCounter M.! c)
put (newPre, newMinOcc, newMaxOcc, newCounter)
countChars cs
This code works as I want it for small files, counting and displaying the stats of the file, but it fails when the file gets bigger:
$ ls -l 46000.txt.utf-8
-rw-r--r-- 1 nobbz nobbz 156315 Jun 16 20:33 46000.txt.utf-8
$ ./Main 46000.txt.utf-8
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.
I don't grasp why a file of about 150kiB can overflow an 8Mib stack. If I increase to 10 MiB (./Main 46000.txt.utf-8 +RTS -k10M
) it works, but using also -sstderr
shows that the Program consumes 107 MiB in total.
How could I decrease memory usage, so it would work with large files and default stack-size?