I'm trying be able to compare two Strings and identify duplicate words. For example;
String1 = "Hello, my name is John."
String2 = "Can you tell me your name please?"
Comparing String1 and String2 would return the word; "name".
I know it is possible to split these two strings into an array of words, and then iterate over each word of each String in a 2-D array. However this is computationally expensive at O(n^2) and I was wondering if there is a faster way of doing this?
Thanks.
EDIT: Changed the example for clarity.
-
So, you also want punctuation removed, then?fge– fge2013年01月08日 16:05:30 +00:00Commented Jan 8, 2013 at 16:05
-
@fge Sorry, failed to notice that example wouldn't work. I have changed it now.Dan_Dan_Man– Dan_Dan_Man2013年01月08日 16:08:52 +00:00Commented Jan 8, 2013 at 16:08
2 Answers 2
After getting the strings to word arrays:
You can add all the elements in the first array to a hashmap and then scan the second array to see if each of the elements exists in the hashmap. Since access time to a hashmap is O(1), this will be O(n+m) time complexity.
If you do not want to use extra space, you can sort both of the arrays in O(nlogn) and then compare the items in O(n+m) which would give you O(nlogn) in total.
-
Okay I'll give this a go and report back. ThanksDan_Dan_Man– Dan_Dan_Man2013年01月08日 16:12:03 +00:00Commented Jan 8, 2013 at 16:12
-
The hashmap solution is probably the best, just bear in mind that the speed difference can be much more significant for longer texts.bjedrzejewski– bjedrzejewski2013年01月08日 16:15:26 +00:00Commented Jan 8, 2013 at 16:15
-
@jedrus07 Yes, that is absolutely right, I just wanted to present another option better than O(n^2)Faruk Sahin– Faruk Sahin2013年01月08日 16:17:20 +00:00Commented Jan 8, 2013 at 16:17
-
Okay this works well, thank you. I'm going to try the other answer below as well and see if it runs any faster before I accept this answerDan_Dan_Man– Dan_Dan_Man2013年01月08日 16:52:01 +00:00Commented Jan 8, 2013 at 16:52
One simple solution is to use the Sets.intersection
method of Guava's Sets
. It is pretty easy:
String s1 = "Hello, my name is John.";
String s2 = "Can you tell me your name?";
Splitter splitter = Splitter.onPattern("\\W").trimResults().omitEmptyStrings();
Set<String> intersection = Sets.intersection(//
Sets.newHashSet(splitter.split(s1)), //
Sets.newHashSet(splitter.split(s2)));
System.out.println(intersection);
Output:
[name]
You can also find more information on algorithms to detect Set intersection on this thread.
-
Should the Splitter object be a StringSplitter? Splitter isn't being recognised.Dan_Dan_Man– Dan_Dan_Man2013年01月08日 16:50:20 +00:00Commented Jan 8, 2013 at 16:50
-
BTW I'm using
Guava 13.0.1
for this.Alex– Alex2013年01月08日 17:24:39 +00:00Commented Jan 8, 2013 at 17:24 -
Okay. I'm just gonna go with the answer above as it doesn't require any external libraries and I was able to get it working at a fairly fast speed. Thanks for your help anyway.Dan_Dan_Man– Dan_Dan_Man2013年01月08日 17:26:53 +00:00Commented Jan 8, 2013 at 17:26