I try to find the center of all the red pixels contained in an image using Haskell
and repa
.
My problem is, that the code is not fast enough, probably because I read the image with repa-io
which returns an Array Z DIM3 Word8
, but I was not able to figure out how to filter the repa array by how red the given pixels are.
That's why I convert the array to a normal list, using R.toList
. The rest of the calculations are then done using lists.
Here is my code:
import Data.Array.Repa.Repr.ForeignPtr (F, fromForeignPtr, toForeignPtr)
import Data.Array.Repa (Z(..), (:.)(..), DIM0, DIM1, DIM2, DIM3, Array(..),
Source, Shape)
import qualified Data.Array.Repa as R
import Data.Array.Repa.IO.DevIL
import Data.List
import System.Environment (getArgs)
main :: IO ()
main = do
[path] <- getArgs
(RGBA arr) <- runIL $ readImage path
putStrLn $ "red center " ++ (show . findRedDot $ arr)
-- | find the red dot
findRedDot :: (Ord a, Num a, Source r a) => Array r DIM3 a -> (Int, Int)
findRedDot img = (median . fst $ splitted, median . snd $ splitted)
where -- function returning the coordinates as a tuple, when the pixel
-- is red enough, otherwise return (-1, -1)
convert f (Z :. i :. j) = let r = f (Z :. i :. j :. 0)
g = f (Z :. i :. j :. 1)
b = f (Z :. i :. j :. 2) in
if r > 237 && (g + b) < 50
then (i, j)
else (-1, -1)
-- turn the 3rd dimension into tuples containing the coordinates
-- (see convert function)
packed = R.traverse img
(\(Z :. w :. h :. _) -> Z :. w :. h)
convert
-- turns the DIM2 repa array into a default haskell list
lst = R.toList packed
p (-1, -1) = False
p _ = True
-- split up the list into x and y component
splitted = splitUp $ filter p lst
-- | split up a list of tuples into two lists
splitUp :: [(Int, Int)] -> ([Int], [Int])
splitUp [] = ([], [])
splitUp ((x, y):rest) = (x:(fst next), y:(snd next))
where next = splitUp rest
-- | find the median of a numeric list
median :: (Num a, Ord a) => [a] -> a
median [] = -1
median l = sorted !! mid
where len = length l
mid = len `quot` 2
sorted = sort l
I did compile with ghc -O2 -threaded -prof -fprof-auto Main.hs
.
Running this with ./Main.hs example.png +RTS -p
gives me the correct result after some time, when using a 1920*1080 image which is completely red (This is the worst case, because finding the median is very time consuming).
Content of Main.prof
Mon May 8 21:26 2017 Time and Allocation Profiling Report (Final)
test5 +RTS -p -RTS fully-red-extreme.png
total time = 0.81 secs (809 ticks @ 1000 us, 1 processor)
total alloc = 2,248,665,408 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
median.sorted Main 48.2 44.7
splitUp Main 10.5 13.3
findRedDot.convert Main 10.0 2.2
findRedDot.lst Main 7.0 19.2
splitUp.next Main 6.4 2.2
findRedDot.splitted Main 3.5 5.2
findRedDot.convert.r Main 3.2 4.4
main Main 2.8 0.0
findRedDot.convert.b Main 2.6 4.4
findRedDot.convert.g Main 1.4 4.4
median.len Main 1.4 0.0
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 58 0 0.1 0.0 100.0 100.0
main Main 117 0 2.8 0.0 99.9 100.0
findRedDot Main 118 1 0.0 0.0 97.0 100.0
findRedDot.splitted Main 123 1 3.5 5.2 21.3 20.7
splitUp Main 130 2073601 10.5 13.3 16.9 15.5
splitUp.next Main 133 2073600 6.4 2.2 6.4 2.2
findRedDot.p Main 124 2073600 0.9 0.0 0.9 0.0
findRedDot.lst Main 122 1 7.0 19.2 25.2 34.7
findRedDot.packed Main 125 0 1.0 0.0 18.2 15.5
findRedDot.convert Main 126 2073600 10.0 2.2 17.2 15.5
findRedDot.convert.b Main 129 2073600 2.6 4.4 2.6 4.4
findRedDot.convert.g Main 128 2073600 1.4 4.4 1.4 4.4
findRedDot.convert.r Main 127 2073600 3.2 4.4 3.2 4.4
findRedDot.packed Main 120 1 0.0 0.0 0.0 0.0
findRedDot.packed.\ Main 121 1 0.0 0.0 0.0 0.0
median Main 119 2 1.0 0.0 50.6 44.7
median.sorted Main 134 2 48.2 44.7 48.2 44.7
median.len Main 132 2 1.4 0.0 1.4 0.0
median.mid Main 131 2 0.0 0.0 0.0 0.0
CAF Main 115 0 0.0 0.0 0.0 0.0
main Main 116 1 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 102 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 100 0 0.0 0.0 0.0 0.0
CAF GHC.Event.Thread 98 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 94 0 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal 84 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.Text 79 0 0.0 0.0 0.0 0.0
The execution time is definitely longer than the one given in the .prof
file and I don't know whether the sorting of the list can be accelerated in some way.
The most important goal is to spare the conversion from a repa
array into a list and to do the whole computation with repa
arrays.
1 Answer 1
Small things I noticed:
splitUp
is already defined in the prelude asunzip
.median
working on a list makes it slow and your implementation is not semantically correct.
Generally the median is defined as the "center" element for datasets of odd magnitude, or the arithmetic mean of the two center elements for datasets of even magnitude.If you made it use an array, you'd also get the vast benefit of not needing the
len
traverse the full list, nor to traverse it when accessing a specific index, which should cut execution time for that function roughly in half.According to the profiling, that will also vastly reduce overall execution time.
The argument to
convert f
infindRedDot
is repeated unnecessarily a lot, it should be possible to do the following:convert f a@(Z:. i :. j) = let r = f (a :. 0) g = f (a :. 1) b = f (a :. 2) in if r > 237 && (g+b) < 50 then (i,j) else (-1, -1)
This makes it a tad easier to understand what exactly you do here, IMO
You speak of not being able to figure out a way to filter the repa array. A bit of hoogling reveals
selectP
, which should help with that. They explicitly mention:Produce an array by applying a predicate to a range of integers. If the predicate matches, then use the second function to generate the element.
- This is a low-level function helpful for writing filtering operations on arrays.
- Use the integer as the index into the array you're filtering.
-
\$\begingroup\$
selectP
does only work for integers, but I have to filter the array by looking at the last dimension in a whole. What is your suggestion concerning themedian
problem, I don't really see, what you are trying to say. \$\endgroup\$Erich– Erich2017年10月10日 16:31:18 +00:00Commented Oct 10, 2017 at 16:31 -
\$\begingroup\$ @Erich your median implementation iterates the list thrice. If you had an array, that'd more likely be around 1.7 times for sorting and then a constant-time access to the median element(s). Re:
selectP
: Read the quoted section again. This is what they use for filtering arrays. Reread if you don't understand it yet... \$\endgroup\$Vogel612– Vogel6122017年10月10日 16:43:46 +00:00Commented Oct 10, 2017 at 16:43
Data.Repa.Array.Auto
,Data.Repa.Array.Generic
andData.Repa.Array.Material
each definefilter
. \$\endgroup\$repa-array
is not an official variant (yet). \$\endgroup\$Array
type has completely changed. Could you give me a hint where to look for instructions? Or could you give me a short example? \$\endgroup\$