I'm working on a "Do you mean ..." kinda system similar to Google! The speller part is trivial (with PHP's pspell library) but what I can't solve is the case problem.
Let's say the mispelled word is "GoVeNMeNt" then the correct word should be "GoVerNMeNt" (similar to Google), but pspell library gives suggestions only in one-case (lower-case usually).
So how do I write a function transformCase which takes in the actual string ($string) and the suggestion string ($subject)? I have written the following implementation which doesn't handle all cases:
function transformCase($string,$subject){
for ($i=0,$marker=0;$i<strlen($string);++$i)
if (strcasecmp($string[$i],$subject[$marker])==0){
$subject[$marker]=$string[$i];
$marker+=1;
}
elseif (strlen($string)==strlen($subject))
$marker+=1;
return $subject;
}
echo transformCase("AbSaNcE",'absence')."\n"; # AbSeNcE :)
echo transformCase("StRioNG",'string')."\n"; # StRiNG :)
echo transformCase("GOVERMENt",'government')."\n"; # GOVERNment :<
In the last case the output should be GOVERnMENt. The algorithm also doesn't work on various other queries.
So I'd be happy if someone helps me with the algorithm :)
-
1Why does the case matter?Barmar– Barmar2020年01月05日 17:44:54 +00:00Commented Jan 5, 2020 at 17:44
-
1Don't use exclamation points, you're not yelling at us (and if you are, this is not the place for those kind of posts). Rather than answer your question, a counter-question: why do you need to match case? If someone searched for GOVORnMENt, your autosuggester saying "did you mean government?" is fine. Why is it important to preserve case, when your search backend is going to do case insensitive matching anyway?Mike 'Pomax' Kamermans– Mike 'Pomax' Kamermans2020年01月05日 17:45:08 +00:00Commented Jan 5, 2020 at 17:45
-
The case matters because I want to make it very similar to Google! Try searching GoVERMENt in google and it'd say "Did you mean GoVERNMENt"! So that's why the case mattersNeer– Neer2020年01月06日 04:49:01 +00:00Commented Jan 6, 2020 at 4:49
1 Answer 1
Try the next modification to your algorithm:
function transformCase($string,$subject) {
for ($i=0,$marker=0;$i<strlen($string);++$i) {
if (strcasecmp($string[$i],$subject[$marker])==0) {
$subject[$marker]=$string[$i];
$marker+=1;
}
// Look for the next same character in $string
while (strcasecmp($string[$i],$subject[$marker])!=0) {
$i+=1;
}
}
return $subject;
}
The comparisson elseif (strlen($string)==strlen($subject)) don't warrant the function to work as you need. Otherwise, you can introduce additional modifications for a best performance.