Simple anagram finder using Scala and some Scalaz

Question 1

I'd like to hear some thoughts on my simple anagram finder, written in Scala. I mainly wrote this as a first go at writing some serious Scala, and I wanted to try out the Scalaz library.

I'm coming from a Java background, with minimal experience with functional languages. Things I'd ideally like feedback on:

Any improvements to make it less imperative
Any other uses of library code, like Scalaz

I know that I've not covered all test cases, but this was enough to get me up and running.

General algorithm:

Parse a list of words. Turn each word into lower case, sort alphabetically and treat that as the 'signature' of the word.
Insert signature and word into a map, with the key being the signature, and the value being a set (or list) of words with the same signature
Retrieval is simply looking up a word based on its signature

Signature class

case class Stem(word: String) {
 val stem: List[Char] = word.toLowerCase.trim.toList.sortWith((e1, e2) => (e1 < e2))
 override def equals(p1: Any) = p1 match {
 case s: Stem => s.stem == stem
 case _ => false
 }
 override def hashCode = stem.hashCode
 override def toString = stem.mkString("Stem(", "", ")")
}

Notes:

I'm sure I'd be able to take the word as a parameter and manipulate it to be in alphabetical order, rather than having a stem field. I'm sure I'd have been able to omit the equals and hashCode methods then, but I couldn't work out how to do that. I think maybe because I'm using a case class here?

Anagram Finder class

import scalaz._
import Scalaz._
class AnagramFinder(words: List[String]) {
 
 implicit def string2Stem(s: String): Stem = Stem(s)
 
 val mapping = createMapping(words)
 
 def createMapping(words: List[String]) = {
 var anagrams: Map[Stem, Set[String]] = Map()
 
 words.foreach(s => anagrams = anagrams |+| Map(Stem(s) -> Set(s)))
 
 anagrams
 }
 
 def find(word: String): Option[Set[String]] = mapping.get(word)
}

Notes:

I really wanted to just have anagrams |+| Map(Stem(s) -> Set(s)) in my foreach, but that required (a) using a mutable map instead, which I'd be fine with, but (b) having to write my own semigroup implementation for the mutable Map, which I struggled with.

Also, anagrams |+| Map(Stem(s) -> Set(s)) isn't picking up my implicit String to Stem method. Why?

Anagrams object

object Anagrams {
 
 def apply(wordListFile: String, word: String) = {
 val finder = new AnagramFinder(scala.io.Source.fromFile(wordListFile).mkString.split('\n').toList)
 finder.find(word)
 }
}

Notes: I know I should initialise the word list into a val field on the object which can be reused between calls, but this was really just to get me started.

Appendix: Tests

class StemTest extends org.scalatest.FunSuite {
 
 test("Stem of noel is e,l,n,o") {
 assert(Stem("noel").stem == List('e','l','n','o'))
 }
 
 test("Stem of Hello is H,e,l,l,o") {
 assert(Stem("Hello").stem == List('H', 'e', 'l', 'l', 'o'))
 }
 
 test("Stems of abc and cba are equal") {
 assert(Stem("abc") == Stem("cba"))
 }
}
class AnagramFinderTest extends FunSuite {
 
 test("List of words should create appropriate map") {
 val testMap:Map[Stem, Set[String]] = Map(Stem("abc") -> Set("cba"), Stem("def") -> Set("def"), Stem("ggz") -> Set("zgg"))
 val givenList = List("cba", "def", "zgg")
 
 val mapping = new AnagramFinder(givenList).mapping
 assert(mapping == testMap)
 }
 
 test("Same stemmed words should both exist in the map") {
 val testMap:Map[Stem, Set[String]] = Map(Stem("abc") -> Set("abc", "cba"))
 val givenList = List("abc", "cba")
 
 val mapping = new AnagramFinder(givenList).mapping
 assert(mapping == testMap)
 }
 
 test("Finding words") {
 val givenList = List("abc", "cba", "bob", "bca")
 
 val anagramFinder = new AnagramFinder(givenList)
 
 val foundAnagrams = anagramFinder.find("abc").get
 
 val expected = Set("abc", "bca", "cba")
 
 assert(foundAnagrams.forall(s => expected contains s))
 assert(expected.forall(s => foundAnagrams contains s))
 }
 
 test("Full test") {
 
 val expected = Some(Set("act", "cat", "Cat"))
 val actual = Anagrams("/usr/share/dict/words", "cat")
 
 assert(expected == actual)
 }
}

Question 2

Some suggestions to improve your code:

xs.sortWith((e1, e2) => (e1 < e2)) is identical to xs.sorted

If you are not interested in Stem.word you can place it into companions' apply-method:

object Stem {
 def apply(word: String): Stem = {
 val stem: List[Char] = word.toLowerCase.trim.toList.sorted
 new Stem(stem)
 }
}
case class Stem private(stem: List[Char])

source.mkString.split('\n') is identical to source.getLines

Whenever you want to sum-up something or use an accumulator, fold is what you are searching for:

(Map.empty[Stem, Set[String]] /: words) {
 case (map, word) => map + (Stem(word) -> Set(word))
}

With help of scalaz:

words.map(word => Map(Stem(word) -> Set(word))) reduceLeft { _|+|_ }

reduce is a fold: xs.tail.fold(xs.head)(f)

/: is a synonym for foldLeft: (init /: xs)(f) == (xs foldLeft init)(f)

Your implicit conversion String => Stem doesn't work in string -> set because -> already needs an implicit conversion and you are not allowed to apply multiple ones at once. When you use tuple notation (string, set), your implicit conversion is applied.

kiritsuku kiritsuku 1,5831 gold badge12 silver badges14 bronze badges · Accepted Answer · 2012-09-09 11:02:57Z

Some suggestions to improve your code:

xs.sortWith((e1, e2) => (e1 < e2)) is identical to xs.sorted

If you are not interested in Stem.word you can place it into companions' apply-method:

object Stem {
 def apply(word: String): Stem = {
 val stem: List[Char] = word.toLowerCase.trim.toList.sorted
 new Stem(stem)
 }
}
case class Stem private(stem: List[Char])

source.mkString.split('\n') is identical to source.getLines

Whenever you want to sum-up something or use an accumulator, fold is what you are searching for:

(Map.empty[Stem, Set[String]] /: words) {
 case (map, word) => map + (Stem(word) -> Set(word))
}

With help of scalaz:

words.map(word => Map(Stem(word) -> Set(word))) reduceLeft { _|+|_ }

reduce is a fold: xs.tail.fold(xs.head)(f)

/: is a synonym for foldLeft: (init /: xs)(f) == (xs foldLeft init)(f)

Your implicit conversion String => Stem doesn't work in string -> set because -> already needs an implicit conversion and you are not allowed to apply multiple ones at once. When you use tuple notation (string, set), your implicit conversion is applied.

Stack Exchange Network

Simple anagram finder using Scala and some Scalaz

General algorithm:

Signature class

Anagram Finder class

Anagrams object

Appendix: Tests

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Simple anagram finder using Scala and some Scalaz

General algorithm:

Signature class

Anagram Finder class

Anagrams object

Appendix: Tests

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions