In learning haskell, I'm writing a fuzzy menu. At the moment, my executable reads in a 'dictionary' from stdin, and ranks each word according to how well it fuzzily matches a search pattern given in the first CLI arg. The idea of the fuzzy matching algorithm is to split a pattern by its delimiters, and then match each character with a prefix of a token, accumulating a score to represent the quality of the match.
My main module looks like this:
module Main where
import Data.List
import Fuzzy
import System.Environment
main :: IO ()
main = do
contents <- getContents
let dict = lines contents
args <- getArgs
let pattern = splitWord (head args)
let scored = map (\x -> (score (x, pattern), x)) dict
print (sort scored)
I'm not sure whether or not I'm misusing the do block and/or some I/O primitives here: overall, I think it could be better but I don't know how to change it.
The Util module looks like this:
module Util
( splitWord
, boolToFloat
, nextChar
) where
splitWord :: String -> [String]
splitWord (l : '_' : r ) = splitWord ([l, '-'] ++ r)
splitWord (l : '.' : r ) = splitWord ([l, '-'] ++ r)
splitWord (l : ':' : r ) = splitWord ([l, '-'] ++ r)
splitWord (l : '-' : r ) = [[l]] ++ splitWord r
splitWord (c : []) = [[c]]
splitWord [] = []
splitWord s = do
let rest = splitWord (tail s)
let first = (head s) : (head rest)
return first ++ tail rest
boolToFloat :: Bool -> Float
boolToFloat True = 1.0
boolToFloat False = 0.0
nextChar :: [String] -> [String]
nextChar s = case tail (head s) of
[] -> tail s
n -> [n] ++ tail s
Especially in splitWord
, I think the code here is somewhat repetitive, and again I don't really know how to make it simpler.
And finally, the Fuzzy module is as follows:
module Fuzzy
( score
) where
import Util
score :: (String, [String]) -> Float
score ([], _ ) = 0
score (_ , []) = 0
score (s , t ) = boolToFloat (head s == head (head t))
+ max (score (tail s, t) * 0.8) (score (tail s, nextChar t))
This module (and function) is the one I have the least concerns about - most of the problems in my code (as I perceive them) are about IO and redundancy in splitWord's pattern matching. Thanks for any advice!
1 Answer 1
- I think your use of do notation in the splitWord function is bad and I think your code is even correct by accident. Let me add parentheses to highlight it:
splitWord s = do
let rest = splitWord (tail s)
let first = (head s) : (head rest)
(return first) ++ tail rest
Most people, including me, will assume the last return statement in a do block to be applied last, like this:
splitWord s = do
let rest = splitWord (tail s)
let first = (head s) : (head rest)
return (first ++ tail rest)
Which is incorrect in this case! I would advise to just not use do notation here:
splitWord s =
let rest = splitWord (tail s)
first = (head s) : (head rest)
in [first] ++ tail rest
- Use pattern matching instead of
head
andtail
:
(arg:_) <- getArgs
let pattern = splitWord arg
splitWord (l:r) =
let (l':r') = splitWord r
first = l : l'
in [first] ++ r'
nextChar :: [String] -> [String]
nextChar ((_:[]):s) = s
nextChar ((_:n):s) = [n] ++ s
score ((hs:s, t@((hht:_):_)) = boolToFloat (hs == hht)
+ max (score (s, t) * 0.8) (score (s, nextChar t))
This also makes it more clear to me that your score
function is partial: what should the result of score ("x", [""])
be?
- It is also in general more common to use curried functions instead of tuples as arguments:
score :: String -> [String] -> Float
score [] _ = 0
score _ [] = 0
score (hs:s) t@((hht:_):_) = boolToFloat (hs == hht)
+ max (score s t * 0.8) (score s (nextChar t))
- You can make the
main
function a bit nicer withfmap
:
main :: IO ()
main = do
dict <- fmap lines getContents
args <- fmap (splitWord . head) getArgs
let scored = map (\x -> (score (x, pattern), x)) dict
print (sort scored)
But it looks decent without that too, so don't worry to much about that.
- You can deal with the repetitiveness in
splitWord
by using guards and theelem
function, change:
splitWord (l : '_' : r ) = splitWord ([l, '-'] ++ r)
splitWord (l : '.' : r ) = splitWord ([l, '-'] ++ r)
splitWord (l : ':' : r ) = splitWord ([l, '-'] ++ r)
splitWord (l : '-' : r ) = [[l]] ++ splitWord r
to:
splitWord (l : x : r )
| x `elem` "_.:-" = [l] : splitWord r
Note that the elem
function can be slower than pattern matching, so alternatively you can split out the pattern matching in its own function:
isSep :: Char -> Bool
isSep '_' = True
isSep '.' = True
isSep ':' = True
isSep '-' = True
isSep _ = False
...
splitWord (l : x : r )
| isSep x = [l] : r
- Use library functions like
break
insplitWord
:
splitWord :: String -> [String]
splitWord s = case break isSep word of
([], x:r) -> x : splitWord r -- *
(l, []) -> l
(l, _:r) -> l : splitWord r
* This first case is a bit strange and maybe you don't need it, but I think it is needed to make this function match your splitWord function exactly.