I'd like to hear some thoughts on my simple anagram finder, written in Scala. I mainly wrote this as a first go at writing some serious Scala, and I wanted to try out the Scalaz library.
I'm coming from a Java background, with minimal experience with functional languages. Things I'd ideally like feedback on:
- Any improvements to make it less imperative
- Any other uses of library code, like Scalaz
I know that I've not covered all test cases, but this was enough to get me up and running.
General algorithm:
- Parse a list of words. Turn each word into lower case, sort alphabetically and treat that as the 'signature' of the word.
- Insert signature and word into a map, with the key being the signature, and the value being a set (or list) of words with the same signature
- Retrieval is simply looking up a word based on its signature
Signature class
case class Stem(word: String) {
val stem: List[Char] = word.toLowerCase.trim.toList.sortWith((e1, e2) => (e1 < e2))
override def equals(p1: Any) = p1 match {
case s: Stem => s.stem == stem
case _ => false
}
override def hashCode = stem.hashCode
override def toString = stem.mkString("Stem(", "", ")")
}
Notes:
I'm sure I'd be able to take the word as a parameter and manipulate it to be in alphabetical order, rather than having a stem
field. I'm sure I'd have been able to omit the equals
and hashCode
methods then, but I couldn't work out how to do that. I think maybe because I'm using a case class
here?
Anagram Finder class
import scalaz._
import Scalaz._
class AnagramFinder(words: List[String]) {
implicit def string2Stem(s: String): Stem = Stem(s)
val mapping = createMapping(words)
def createMapping(words: List[String]) = {
var anagrams: Map[Stem, Set[String]] = Map()
words.foreach(s => anagrams = anagrams |+| Map(Stem(s) -> Set(s)))
anagrams
}
def find(word: String): Option[Set[String]] = mapping.get(word)
}
Notes:
I really wanted to just have anagrams |+| Map(Stem(s) -> Set(s))
in my foreach
, but that required (a) using a mutable map instead, which I'd be fine with, but (b) having to write my own semigroup implementation for the mutable Map, which I struggled with.
Also, anagrams |+| Map(Stem(s) -> Set(s))
isn't picking up my implicit String to Stem method. Why?
Anagrams object
object Anagrams {
def apply(wordListFile: String, word: String) = {
val finder = new AnagramFinder(scala.io.Source.fromFile(wordListFile).mkString.split('\n').toList)
finder.find(word)
}
}
Notes: I know I should initialise the word list into a val
field on the object which can be reused between calls, but this was really just to get me started.
Appendix: Tests
class StemTest extends org.scalatest.FunSuite {
test("Stem of noel is e,l,n,o") {
assert(Stem("noel").stem == List('e','l','n','o'))
}
test("Stem of Hello is H,e,l,l,o") {
assert(Stem("Hello").stem == List('H', 'e', 'l', 'l', 'o'))
}
test("Stems of abc and cba are equal") {
assert(Stem("abc") == Stem("cba"))
}
}
class AnagramFinderTest extends FunSuite {
test("List of words should create appropriate map") {
val testMap:Map[Stem, Set[String]] = Map(Stem("abc") -> Set("cba"), Stem("def") -> Set("def"), Stem("ggz") -> Set("zgg"))
val givenList = List("cba", "def", "zgg")
val mapping = new AnagramFinder(givenList).mapping
assert(mapping == testMap)
}
test("Same stemmed words should both exist in the map") {
val testMap:Map[Stem, Set[String]] = Map(Stem("abc") -> Set("abc", "cba"))
val givenList = List("abc", "cba")
val mapping = new AnagramFinder(givenList).mapping
assert(mapping == testMap)
}
test("Finding words") {
val givenList = List("abc", "cba", "bob", "bca")
val anagramFinder = new AnagramFinder(givenList)
val foundAnagrams = anagramFinder.find("abc").get
val expected = Set("abc", "bca", "cba")
assert(foundAnagrams.forall(s => expected contains s))
assert(expected.forall(s => foundAnagrams contains s))
}
test("Full test") {
val expected = Some(Set("act", "cat", "Cat"))
val actual = Anagrams("/usr/share/dict/words", "cat")
assert(expected == actual)
}
}
1 Answer 1
Some suggestions to improve your code:
xs.sortWith((e1, e2) => (e1 < e2))
is identical to xs.sorted
If you are not interested in Stem.word
you can place it into companions' apply-method:
object Stem {
def apply(word: String): Stem = {
val stem: List[Char] = word.toLowerCase.trim.toList.sorted
new Stem(stem)
}
}
case class Stem private(stem: List[Char])
source.mkString.split('\n')
is identical to source.getLines
Whenever you want to sum-up something or use an accumulator, fold
is what you are searching for:
(Map.empty[Stem, Set[String]] /: words) {
case (map, word) => map + (Stem(word) -> Set(word))
}
With help of scalaz:
words.map(word => Map(Stem(word) -> Set(word))) reduceLeft { _|+|_ }
reduce
is a fold
: xs.tail.fold(xs.head)(f)
/:
is a synonym for foldLeft
: (init /: xs)(f) == (xs foldLeft init)(f)
Your implicit conversion String => Stem
doesn't work in string -> set
because ->
already needs an implicit conversion and you are not allowed to apply multiple ones at once. When you use tuple notation (string, set)
, your implicit conversion is applied.