7

I'm trying be able to compare two Strings and identify duplicate words. For example;

String1 = "Hello, my name is John."
String2 = "Can you tell me your name please?"

Comparing String1 and String2 would return the word; "name".

I know it is possible to split these two strings into an array of words, and then iterate over each word of each String in a 2-D array. However this is computationally expensive at O(n^2) and I was wondering if there is a faster way of doing this?

Thanks.

EDIT: Changed the example for clarity.

Faruk Sahin
8,8265 gold badges31 silver badges34 bronze badges
asked Jan 8, 2013 at 16:02
2
  • So, you also want punctuation removed, then? Commented Jan 8, 2013 at 16:05
  • @fge Sorry, failed to notice that example wouldn't work. I have changed it now. Commented Jan 8, 2013 at 16:08

2 Answers 2

13

After getting the strings to word arrays:

You can add all the elements in the first array to a hashmap and then scan the second array to see if each of the elements exists in the hashmap. Since access time to a hashmap is O(1), this will be O(n+m) time complexity.

If you do not want to use extra space, you can sort both of the arrays in O(nlogn) and then compare the items in O(n+m) which would give you O(nlogn) in total.

answered Jan 8, 2013 at 16:10
4
  • Okay I'll give this a go and report back. Thanks Commented Jan 8, 2013 at 16:12
  • The hashmap solution is probably the best, just bear in mind that the speed difference can be much more significant for longer texts. Commented Jan 8, 2013 at 16:15
  • @jedrus07 Yes, that is absolutely right, I just wanted to present another option better than O(n^2) Commented Jan 8, 2013 at 16:17
  • Okay this works well, thank you. I'm going to try the other answer below as well and see if it runs any faster before I accept this answer Commented Jan 8, 2013 at 16:52
6

One simple solution is to use the Sets.intersection method of Guava's Sets. It is pretty easy:

String s1 = "Hello, my name is John.";
String s2 = "Can you tell me your name?";
Splitter splitter = Splitter.onPattern("\\W").trimResults().omitEmptyStrings();
Set<String> intersection = Sets.intersection(//
 Sets.newHashSet(splitter.split(s1)), //
 Sets.newHashSet(splitter.split(s2)));
System.out.println(intersection);

Output:

[name]

You can also find more information on algorithms to detect Set intersection on this thread.

answered Jan 8, 2013 at 16:16
3
  • Should the Splitter object be a StringSplitter? Splitter isn't being recognised. Commented Jan 8, 2013 at 16:50
  • BTW I'm using Guava 13.0.1 for this. Commented Jan 8, 2013 at 17:24
  • Okay. I'm just gonna go with the answer above as it doesn't require any external libraries and I was able to get it working at a fairly fast speed. Thanks for your help anyway. Commented Jan 8, 2013 at 17:26

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.