I wrote this code in Haskell (instead of Python) for the educational benefit. Can anyone suggest ways to improve this code?
I'm guessing that I'm using fromIntegral
inefficiently.
It takes two commandline arguments. The first is a path to a symmetric distance matrix. The second is a threshold. The program interprets vertices to be adjacent if their distance is less than the threshold. Then the program counts the number of connected components and the number of vertices in each connected component and prints this information.
import System.Environment
import Data.Matrix hiding (flatten)
import qualified Data.Vector as V
import Data.Graph
import Data.Tree
-- Turns a distance matrix to an adjacency matrix using a threshold, then prints the number
-- and size of the connected components.
-- Usage: run `stack run location_of_distance_matrix threshold`
-- Output is in the form (number of bins, [number of vertices in each bin]).
main :: IO ()
main = do
args <- getArgs
contents <- readFile $ args !! 0
let dmat = fromLists $ (map ((map (read :: String -> Float)) . words) (lines contents))
amat = amatFromDmat dmat $ read (args !! 1)
(g,_,_) = graphFromEdges (map (\n -> (n, n, neighbours n amat)) [(1 :: Integer)..(fromIntegral $ ncols amat)])
comp = components g
putStrLn $ show $ (length comp, map (length . flatten) comp)
-- Transforms a distance matrix into an adjacency matrix using a threshold.
amatFromDmat :: Matrix Float -> Float -> Matrix Bool
amatFromDmat m e = matrix (nrows m) (ncols m) threshold
where threshold (i,j)
| i == j = False
| m ! (i,j) < e = True
| otherwise = False
-- Outputs the list of neighbours of a vertex in a graph, taking an adjacency
-- matrix.
-- The addition and subtraction of 1 are here because vectors are 0-indexed but
-- I made my graph vertices 1-indexed.
neighbours :: Integer -> Matrix Bool -> [Integer]
neighbours n mat = map (fromIntegral . (1+)) $ filter (\m -> row V.! m) [0..(ncols mat)-1]
where row = getRow (fromIntegral n) mat
Edit: I found a bug and improved the code a little bit.
1 Answer 1
I haven't done a detailed review of Haskell code in a while, so I suspect my advice could structured better. Anyway, here's a mix of general and specific advice:
- "Functional core, imperative shell": Move more code out of
main
(and out ofIO
) into separate (pure) functions. The type signatures on the extracted functions will help with readability. - Use types to model your domain. Haskell makes it easy to define expressive types, you should make use of that feature! :) For example, you could define
type AdjacencyMatrix = Matrix Float
. - The
Int <-> Integer
conversions look unnecessary to me. Just stick toInt
since theData.Matrix
API forces you to use it anyway. - In general, it's a good idea to use as few partial functions as possible. (I see
(!!)
,(Data.Vector.!)
,read
,getRow
andfromInteger
) Since this is a script, usingread
for parsing is acceptable. Instead of indexing with(Data.Vector.!)
andgetRow
, I'd try to map, fold or zip instead, which usually are total operations. Instead of extracting the command line arguments with (!!
), you could write[filename, threshold] <- getArgs
. amatFromDmat
smells functorial to me, mostly because the input and output matrices have the same dimensions. Maybe try to implement it in terms offmap
. (Hint: If the input is a true distance matrix, the elements on the diagonal are the only ones that are0
.)- Use qualified imports or import lists to make it more clear, where functions are coming from. (I personally prefer qualified imports)
Tree
has aFoldable
instance andlength
is a method ofFoldable
. That means you can simply uselength
to get the size of the connected components. You don't needflatten
.
-
\$\begingroup\$ What is it about the
[filename, threshold] <- getArgs
option that avoids a total operation? And I'm very excited by your functoriality suggestion--I'm studying category theory on the math side and I want to see how to bring it to bear on my programming! \$\endgroup\$Curran McConnell– Curran McConnell2019年05月21日 22:25:32 +00:00Commented May 21, 2019 at 22:25 -
1\$\begingroup\$ Yeah, that's not a great example. You could argue that replacing
(!)
with a partial pattern in IO (which would triggerfail
) you move the partiality out of pure code, and into aMonadFail
where failures happen anyway. In these circumstances it doesn't really matter. \$\endgroup\$sjakobi– sjakobi2019年05月21日 22:52:55 +00:00Commented May 21, 2019 at 22:52 -
1\$\begingroup\$ Just implemented
amatFromDmat
usingfmap
. My mind is like: media.giphy.com/media/THFoDqDi4M92w/giphy.gif \$\endgroup\$Curran McConnell– Curran McConnell2019年05月22日 01:08:38 +00:00Commented May 22, 2019 at 1:08