Skip to main content
Code Review

Return to Answer

replaced http://stackoverflow.com/ with https://stackoverflow.com/
Source Link
added 780 characters in body
Source Link
ErikR
  • 1.9k
  • 10
  • 7

Using Ubboxed Vectors

UpdateI've found another important optimization - you want to use unboxed vectors instead of the regular vectors. Here is an alternate version of genVec:

import qualified Data.Vector.Unboxed as UnboxedV
import qualified Data.Vector.Unboxed.Mutable as UnboxedM
genVec :: [Complex] -> UnboxedV.Vector Int
genVec xs = runST $ do
 mv <- UnboxedM.replicate (imgSize*imgSize) (0::Int)
 forM_ xs $ \c -> do
 let x = computeCIndex c
 count <- UnboxedM.unsafeRead mv x
 UnboxedM.unsafeWrite mv x (count+1)
 UnboxedV.freeze mv

I think you'll find that improvingThis will cut down the Double parsingrun time by another couple of seconds (which is aboutnow significant since the best you can do for a single threaded programwhole pipeline takes now only takes about 4 secs to run.)

Update:

(削除) I think you'll find that improving the Double parsing is about the best you can do for a single threaded program. (削除ここまで) To scale to 600M points you are going to have to use multiple threads / machines. Fortunately this is a classic map-reduce problem, so there's a lot of tools and libraries (not necessarily in Haskell) that you can draw upon.

Update:

I think you'll find that improving the Double parsing is about the best you can do for a single threaded program. To scale to 600M points you are going to have to use multiple threads / machines. Fortunately this is a classic map-reduce problem, so there's a lot of tools and libraries (not necessarily in Haskell) that you can draw upon.

Using Ubboxed Vectors

I've found another important optimization - you want to use unboxed vectors instead of the regular vectors. Here is an alternate version of genVec:

import qualified Data.Vector.Unboxed as UnboxedV
import qualified Data.Vector.Unboxed.Mutable as UnboxedM
genVec :: [Complex] -> UnboxedV.Vector Int
genVec xs = runST $ do
 mv <- UnboxedM.replicate (imgSize*imgSize) (0::Int)
 forM_ xs $ \c -> do
 let x = computeCIndex c
 count <- UnboxedM.unsafeRead mv x
 UnboxedM.unsafeWrite mv x (count+1)
 UnboxedV.freeze mv

This will cut down the run time by another couple of seconds (which is now significant since the whole pipeline takes now only takes about 4 secs to run.)

Update:

(削除) I think you'll find that improving the Double parsing is about the best you can do for a single threaded program. (削除ここまで) To scale to 600M points you are going to have to use multiple threads / machines. Fortunately this is a classic map-reduce problem, so there's a lot of tools and libraries (not necessarily in Haskell) that you can draw upon.

added 763 characters in body
Source Link
ErikR
  • 1.9k
  • 10
  • 7

Update:

I think you'll find that improving the Double parsing is about the best you can do for a single threaded program. To scale to 600M points you are going to have to use multiple threads / machines. Fortunately this is a classic map-reduce problem, so there's a lot of tools and libraries (not necessarily in Haskell) that you can draw upon.

If you just want to scale on a single machine using multiple threads, you can put together your own solution using a module like Control.Monad.Par. For a clustering solution you'll probably have to use a third-party framework like Hadoop in which case you might be interested in the hadron package - there is also a video describing it here: https://vimeo.com/90189610

Update:

I think you'll find that improving the Double parsing is about the best you can do for a single threaded program. To scale to 600M points you are going to have to use multiple threads / machines. Fortunately this is a classic map-reduce problem, so there's a lot of tools and libraries (not necessarily in Haskell) that you can draw upon.

If you just want to scale on a single machine using multiple threads, you can put together your own solution using a module like Control.Monad.Par. For a clustering solution you'll probably have to use a third-party framework like Hadoop in which case you might be interested in the hadron package - there is also a video describing it here: https://vimeo.com/90189610

Source Link
ErikR
  • 1.9k
  • 10
  • 7
Loading
lang-hs

AltStyle によって変換されたページ (->オリジナル) /