Fuzzy Finding in Haskell

Question 1

In learning haskell, I'm writing a fuzzy menu. At the moment, my executable reads in a 'dictionary' from stdin, and ranks each word according to how well it fuzzily matches a search pattern given in the first CLI arg. The idea of the fuzzy matching algorithm is to split a pattern by its delimiters, and then match each character with a prefix of a token, accumulating a score to represent the quality of the match.

My main module looks like this:

module Main where
import Data.List
import Fuzzy
import System.Environment
main :: IO ()
main = do
 contents <- getContents
 let dict = lines contents
 args <- getArgs
 let pattern = splitWord (head args)
 let scored = map (\x -> (score (x, pattern), x)) dict
 print (sort scored)

I'm not sure whether or not I'm misusing the do block and/or some I/O primitives here: overall, I think it could be better but I don't know how to change it.

The Util module looks like this:

module Util
 ( splitWord
 , boolToFloat
 , nextChar
 ) where
splitWord :: String -> [String]
splitWord (l : '_' : r ) = splitWord ([l, '-'] ++ r)
splitWord (l : '.' : r ) = splitWord ([l, '-'] ++ r)
splitWord (l : ':' : r ) = splitWord ([l, '-'] ++ r)
splitWord (l : '-' : r ) = [[l]] ++ splitWord r
splitWord (c : []) = [[c]]
splitWord [] = []
splitWord s = do
 let rest = splitWord (tail s)
 let first = (head s) : (head rest)
 return first ++ tail rest
boolToFloat :: Bool -> Float
boolToFloat True = 1.0
boolToFloat False = 0.0
nextChar :: [String] -> [String]
nextChar s = case tail (head s) of
 [] -> tail s
 n -> [n] ++ tail s

Especially in splitWord, I think the code here is somewhat repetitive, and again I don't really know how to make it simpler.

And finally, the Fuzzy module is as follows:

module Fuzzy
 ( score
 ) where
import Util
score :: (String, [String]) -> Float
score ([], _ ) = 0
score (_ , []) = 0
score (s , t ) = boolToFloat (head s == head (head t))
 + max (score (tail s, t) * 0.8) (score (tail s, nextChar t))

This module (and function) is the one I have the least concerns about - most of the problems in my code (as I perceive them) are about IO and redundancy in splitWord's pattern matching. Thanks for any advice!

Question 2

I think your use of do notation in the splitWord function is bad and I think your code is even correct by accident. Let me add parentheses to highlight it:

splitWord s = do
 let rest = splitWord (tail s)
 let first = (head s) : (head rest)
 (return first) ++ tail rest

Most people, including me, will assume the last return statement in a do block to be applied last, like this:

splitWord s = do
 let rest = splitWord (tail s)
 let first = (head s) : (head rest)
 return (first ++ tail rest)

Which is incorrect in this case! I would advise to just not use do notation here:

splitWord s =
 let rest = splitWord (tail s)
 first = (head s) : (head rest)
 in [first] ++ tail rest

Use pattern matching instead of head and tail:

 (arg:_) <- getArgs
 let pattern = splitWord arg

splitWord (l:r) =
 let (l':r') = splitWord r
 first = l : l'
 in [first] ++ r'

nextChar :: [String] -> [String]
nextChar ((_:[]):s) = s
nextChar ((_:n):s) = [n] ++ s

score ((hs:s, t@((hht:_):_)) = boolToFloat (hs == hht)
 + max (score (s, t) * 0.8) (score (s, nextChar t))

This also makes it more clear to me that your score function is partial: what should the result of score ("x", [""]) be?

It is also in general more common to use curried functions instead of tuples as arguments:

score :: String -> [String] -> Float
score [] _ = 0
score _ [] = 0
score (hs:s) t@((hht:_):_) = boolToFloat (hs == hht)
 + max (score s t * 0.8) (score s (nextChar t))

You can make the main function a bit nicer with fmap:

main :: IO ()
main = do
 dict <- fmap lines getContents
 args <- fmap (splitWord . head) getArgs
 let scored = map (\x -> (score (x, pattern), x)) dict
 print (sort scored)

But it looks decent without that too, so don't worry to much about that.

You can deal with the repetitiveness in splitWord by using guards and the elem function, change:

splitWord (l : '_' : r ) = splitWord ([l, '-'] ++ r)
splitWord (l : '.' : r ) = splitWord ([l, '-'] ++ r)
splitWord (l : ':' : r ) = splitWord ([l, '-'] ++ r)
splitWord (l : '-' : r ) = [[l]] ++ splitWord r

to:

splitWord (l : x : r )
 | x `elem` "_.:-" = [l] : splitWord r

Note that the elem function can be slower than pattern matching, so alternatively you can split out the pattern matching in its own function:

isSep :: Char -> Bool
isSep '_' = True
isSep '.' = True
isSep ':' = True
isSep '-' = True
isSep _ = False
...
splitWord (l : x : r )
 | isSep x = [l] : r

Use library functions like break in splitWord:

splitWord :: String -> [String]
splitWord s = case break isSep word of
 ([], x:r) -> x : splitWord r -- *
 (l, []) -> l
 (l, _:r) -> l : splitWord r

* This first case is a bit strange and maybe you don't need it, but I think it is needed to make this function match your splitWord function exactly.

Noughtmare Noughtmare 2511 silver badge4 bronze badges · Accepted Answer · 2021-06-06 10:04:25Z

I think your use of do notation in the splitWord function is bad and I think your code is even correct by accident. Let me add parentheses to highlight it:

splitWord s = do
 let rest = splitWord (tail s)
 let first = (head s) : (head rest)
 (return first) ++ tail rest

Most people, including me, will assume the last return statement in a do block to be applied last, like this:

splitWord s = do
 let rest = splitWord (tail s)
 let first = (head s) : (head rest)
 return (first ++ tail rest)

Which is incorrect in this case! I would advise to just not use do notation here:

splitWord s =
 let rest = splitWord (tail s)
 first = (head s) : (head rest)
 in [first] ++ tail rest

Use pattern matching instead of head and tail:

 (arg:_) <- getArgs
 let pattern = splitWord arg

splitWord (l:r) =
 let (l':r') = splitWord r
 first = l : l'
 in [first] ++ r'

nextChar :: [String] -> [String]
nextChar ((_:[]):s) = s
nextChar ((_:n):s) = [n] ++ s

score ((hs:s, t@((hht:_):_)) = boolToFloat (hs == hht)
 + max (score (s, t) * 0.8) (score (s, nextChar t))

This also makes it more clear to me that your score function is partial: what should the result of score ("x", [""]) be?

It is also in general more common to use curried functions instead of tuples as arguments:

score :: String -> [String] -> Float
score [] _ = 0
score _ [] = 0
score (hs:s) t@((hht:_):_) = boolToFloat (hs == hht)
 + max (score s t * 0.8) (score s (nextChar t))

You can make the main function a bit nicer with fmap:

main :: IO ()
main = do
 dict <- fmap lines getContents
 args <- fmap (splitWord . head) getArgs
 let scored = map (\x -> (score (x, pattern), x)) dict
 print (sort scored)

But it looks decent without that too, so don't worry to much about that.

You can deal with the repetitiveness in splitWord by using guards and the elem function, change:

splitWord (l : '_' : r ) = splitWord ([l, '-'] ++ r)
splitWord (l : '.' : r ) = splitWord ([l, '-'] ++ r)
splitWord (l : ':' : r ) = splitWord ([l, '-'] ++ r)
splitWord (l : '-' : r ) = [[l]] ++ splitWord r

to:

splitWord (l : x : r )
 | x `elem` "_.:-" = [l] : splitWord r

Note that the elem function can be slower than pattern matching, so alternatively you can split out the pattern matching in its own function:

isSep :: Char -> Bool
isSep '_' = True
isSep '.' = True
isSep ':' = True
isSep '-' = True
isSep _ = False
...
splitWord (l : x : r )
 | isSep x = [l] : r

Use library functions like break in splitWord:

splitWord :: String -> [String]
splitWord s = case break isSep word of
 ([], x:r) -> x : splitWord r -- *
 (l, []) -> l
 (l, _:r) -> l : splitWord r

* This first case is a bit strange and maybe you don't need it, but I think it is needed to make this function match your splitWord function exactly.

Stack Exchange Network

Fuzzy Finding in Haskell

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Fuzzy Finding in Haskell

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions