How to Write a Spelling Corrector in Ruby

:: programming, ruby

Update 10/16/2015: Please see the Racket Version also.

Peter Norvig wrote a simple spelling corrector in 20 lines of Python 2.5, so I thought I’d see what it looks like in Ruby. Here are some areas I’m not pleased with:

  1. List comprehensions in Python made the edits1 function more elegant IMO.
  2. The boolean expression in the correct function evaluates empty sets/arrays as false in Python but not in Ruby, so I had to add the "result.empty? ? nil : result" expression to several functions. I expect there’s a better way to handle this also.

Otherwise, the translation was pretty straightforward.

Here’s a link to Norvig’s page: http://www.norvig.com/spell-correct.html

That page includes a link to a text file that I saved locally as holmes.txt: http://www.norvig.com/holmes.txt

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
def words text
 text.downcase.scan(/[a-z]+/)
end
def train features
 model = Hash.new(1)
 features.each {|f| model[f] += 1 }
 return model
end
NWORDS = train(words(File.new('holmes.txt').read))
LETTERS = ("a".."z").to_a.join
def edits1 word
 n = word.length
 deletion = (0...n).collect {|i| word[0...i]+word[i+1..-1] }
 transposition = (0...n-1).collect {|i| word[0...i]+word[i+1,1]+word[i,1]+word[i+2..-1] }
 alteration = []
 n.times {|i| LETTERS.each_byte {|l| alteration << word[0...i]+l.chr+word[i+1..-1] } }
 insertion = []
 (n+1).times {|i| LETTERS.each_byte {|l| insertion << word[0...i]+l.chr+word[i..-1] } }
 result = deletion + transposition + alteration + insertion
 result.empty? ? nil : result
end
def known_edits2 word
 result = []
 edits1(word).each {|e1| edits1(e1).each {|e2| result << e2 if NWORDS.has_key?(e2) }}
 result.empty? ? nil : result
end
def known words
 result = words.find_all {|w| NWORDS.has_key?(w) }
 result.empty? ? nil : result
end
def correct word
 (known([word]) or known(edits1(word)) or known_edits2(word) or
 [word]).max {|a,b| NWORDS[a] <=> NWORDS[b] }
end

After you’ve saved the holmes.txt file, load the code into irb and call the correct function with a string as follows:

1
2
3
4
5
badkins:~/sync/code/ruby$ irb
irb(main):001:0> require 'spelling_corrector.rb'
=> true
irb(main):002:0> correct "whree"
=> "where"

AltStyle によって変換されたページ (->オリジナル) /