Using Ubboxed Vectors
UpdateI've found another important optimization - you want to use unboxed vectors instead of the regular vectors. Here is an alternate version of genVec
:
import qualified Data.Vector.Unboxed as UnboxedV
import qualified Data.Vector.Unboxed.Mutable as UnboxedM
genVec :: [Complex] -> UnboxedV.Vector Int
genVec xs = runST $ do
mv <- UnboxedM.replicate (imgSize*imgSize) (0::Int)
forM_ xs $ \c -> do
let x = computeCIndex c
count <- UnboxedM.unsafeRead mv x
UnboxedM.unsafeWrite mv x (count+1)
UnboxedV.freeze mv
I think you'll find that improvingThis will cut down the Double parsingrun time by another couple of seconds (which is aboutnow significant since the best you can do for a single threaded programwhole pipeline takes now only takes about 4 secs to run.)
Update:
(削除) I think you'll find that improving the Double parsing is about the best you can do for a single threaded program. (削除ここまで) To scale to 600M points you are going to have to use multiple threads / machines. Fortunately this is a classic map-reduce problem, so there's a lot of tools and libraries (not necessarily in Haskell) that you can draw upon.
Update:
I think you'll find that improving the Double parsing is about the best you can do for a single threaded program. To scale to 600M points you are going to have to use multiple threads / machines. Fortunately this is a classic map-reduce problem, so there's a lot of tools and libraries (not necessarily in Haskell) that you can draw upon.
Using Ubboxed Vectors
I've found another important optimization - you want to use unboxed vectors instead of the regular vectors. Here is an alternate version of genVec
:
import qualified Data.Vector.Unboxed as UnboxedV
import qualified Data.Vector.Unboxed.Mutable as UnboxedM
genVec :: [Complex] -> UnboxedV.Vector Int
genVec xs = runST $ do
mv <- UnboxedM.replicate (imgSize*imgSize) (0::Int)
forM_ xs $ \c -> do
let x = computeCIndex c
count <- UnboxedM.unsafeRead mv x
UnboxedM.unsafeWrite mv x (count+1)
UnboxedV.freeze mv
This will cut down the run time by another couple of seconds (which is now significant since the whole pipeline takes now only takes about 4 secs to run.)
Update:
(削除) I think you'll find that improving the Double parsing is about the best you can do for a single threaded program. (削除ここまで) To scale to 600M points you are going to have to use multiple threads / machines. Fortunately this is a classic map-reduce problem, so there's a lot of tools and libraries (not necessarily in Haskell) that you can draw upon.
Update:
I think you'll find that improving the Double parsing is about the best you can do for a single threaded program. To scale to 600M points you are going to have to use multiple threads / machines. Fortunately this is a classic map-reduce problem, so there's a lot of tools and libraries (not necessarily in Haskell) that you can draw upon.
If you just want to scale on a single machine using multiple threads, you can put together your own solution using a module like Control.Monad.Par
. For a clustering solution you'll probably have to use a third-party framework like Hadoop in which case you might be interested in the hadron package - there is also a video describing it here: https://vimeo.com/90189610
Update:
I think you'll find that improving the Double parsing is about the best you can do for a single threaded program. To scale to 600M points you are going to have to use multiple threads / machines. Fortunately this is a classic map-reduce problem, so there's a lot of tools and libraries (not necessarily in Haskell) that you can draw upon.
If you just want to scale on a single machine using multiple threads, you can put together your own solution using a module like Control.Monad.Par
. For a clustering solution you'll probably have to use a third-party framework like Hadoop in which case you might be interested in the hadron package - there is also a video describing it here: https://vimeo.com/90189610