I've written a phrase-matching method to return the longest matching phrase from two Strings in Ruby. My method looks like this and works as expected:
class String
def phrase_in_common(other)
n = [size, other.size].min
while n > 0 do
other_words = other.downcase.split.each_cons(n)
string_words = split.each_cons(n)
matching_words = string_words.find { |cons_words| other_words.include?(cons_words.map(&:downcase)) }
return matching_words&.join(' ') if matching_words
n -= 1
end
end
end
>> string = "Hello world, please come to my house next week"
>> other = "Hello friends, please come to my party next week"
>> string.phrase_in_common(other)
=> "please come to my"
My question is, can (and should) this be accomplished in a more Ruby way, perhaps by replacing the while
with an Enumerable
method of some kind?
To clarify, the phrase-matching should be based on words, so for example:
>> "this phrase".phrase_in_common("his phrase")
=> "phrase"
Also, note that the method is case-insensitive in matching and uses the case of the subject String for the return value:
>> "Greatest Show On The Earth".phrase_in_common("Silliest Show on the Earth")
=> "Show On The Earth"
1 Answer 1
It seems that you are trying to get the longest common subsequence. There are several implementation of this algorithm and you can find an example on Rosetta Code.
A working recursive implementation implementation could be:
# Split your sentence into words and manipulate the content. E.g.
# - downcase all chars
# - split by some chars such as space, punctuation and so on
def split_into_words(string)
string.downcase.split(/[\s\.,;\'\"]/)
end
def longest_subsequence(xstr, ystr)
return '' if xstr.empty? || ystr.empty?
x, *xs = xstr
y, *ys = ystr
if x == y
x + " " + longest_subsequence(xs, ys)
else
[longest_subsequence(xstr, ys), longest_subsequence(xs, ystr)].max_by {|x| x.size}
end
end
def phrase_in_common(string1, string2)
xstr = split_into_words(string1)
ystr = split_into_words(string2)
longest_subsequence(xstr, ystr).strip
end
This would pass all your specs, but it returns ''
instead of nil
when the longest substring is empty.
You can slightly change this implementation to include int inside class String
eventually
Edit 2017年10月04日
As remarked by moveson
, in Rails we can easily make this method to return nil
instead of an empty string:
def phrase_in_common(string1, string2)
xstr = split_into_words(string1)
ystr = split_into_words(string2)
longest_subsequence(xstr, ystr).strip.presence
end
-
\$\begingroup\$ I'm using Rails, so the last part is easily remedied by adding
.presence
after.strip
. \$\endgroup\$moveson– moveson2017年10月03日 23:57:51 +00:00Commented Oct 3, 2017 at 23:57 -
\$\begingroup\$ I think it also needs
.squeeze(' ')
after the.strip
. Otherwise it sometimes returns phrases with more than one space between words. \$\endgroup\$moveson– moveson2017年10月04日 00:05:13 +00:00Commented Oct 4, 2017 at 0:05 -
\$\begingroup\$ I haven't use squeeze it may have some side effects \$\endgroup\$mabe02– mabe022017年10月04日 10:16:49 +00:00Commented Oct 4, 2017 at 10:16
-
\$\begingroup\$ About the
.presence
after.strip
and I will edit my answer accordingly \$\endgroup\$mabe02– mabe022017年10月04日 10:18:17 +00:00Commented Oct 4, 2017 at 10:18
\w
characters? \$\endgroup\$