An anagram is like a mix-up of the letters in a string:
pots is an anagram of stop
Wilma is an anagram of ilWma
I am going through the book Cracking the Coding Interview and in the basic string manipulation there's the problem:
write a method to check if two strings are anagrams of each other.
My method uses StringBuffer
instead of String because you can .deleteCharAt(index)
with StringBuffer/StringBuilder.
public boolean areAnagrams(StringBuffer s1b, StringBuffer s2b) {
for (int i=0; i<s1b.length(); ++i) {
for (int j=0; j<s2b.length(); ++j) {
if (s1b.charAt(i) == s2b.charAt(j)) {
s1b.deleteCharAt(i);
s2b.deleteCharAt(j);
i=0;
j=0;
}
}
}
if (s1b.equals(s2b)) {
return true;
} else
return false;
}
I iterate over every character in s1b and if I find a matching char in s2b I delete them both from each string and restart the loop (set i
and j
to zero) because the length of the StringBuffer
objects changes when you .deleteCharAt(index)
.
I have two questions:
- Should I use
StringBuilder
overStringBuffer
(in Java)? - How can I make this faster?
In regards to fasterness:
This method is good in that it doesn't require any additional space, but it sorta destroys the data as you're working on it. Are there any alternatives that I have overlooked that could potentially preserve the strings but still see if they are anagrams without using too much external storage (i.e. copies of the strings are not allowed -- as a challenge)?
And, if you can use any sort of storage space in addition to this, can one lower the time complexity to \$O(n)\$ (technically \$O(2n)\$) instead of \$O(n^2)\$?
Also, the above code might not compile because I just wrote it from scratch in here; sorry if it's bugg'd.
10 Answers 10
Start with a simple, easy to understand version. Try to reuse API functions.
import java.util.Arrays;
...
public boolean areAnagrams(String s1, String s2) {
char[] ch1 = s1.toCharArray();
char[] ch2 = s2.toCharArray();
Arrays.sort(ch1);
Arrays.sort(ch2);
return Arrays.equals(ch1,ch2);
}
Of course this is not the fastest way, but in 99% it is "good enough", and you can see what's going on. I would not even consider to juggle with things like deleted chars in StringBuilder if there were no serious performance problem.
-
1\$\begingroup\$ In particular since this version is asymptotically faster than the original version. \$\endgroup\$Konrad Rudolph– Konrad Rudolph2011年04月07日 10:48:10 +00:00Commented Apr 7, 2011 at 10:48
-
\$\begingroup\$ Thank you for the awesome answer. On the vein of reusing API functions, do you know of any good resources for an overview of the Java library methods? Maybe just like a list of ones that are most commonly used would be cool. \$\endgroup\$sova– sova2011年04月07日 12:45:24 +00:00Commented Apr 7, 2011 at 12:45
-
\$\begingroup\$ It's hard to give a good advise here, as Java's libs are so huge. Generally it's a good idea to look at least through the APIs ( download.oracle.com/javase/6/docs/api ) of
java.lang
,java.util
andjava.math
. For Swing there are Oracle's "How to..." tutorials. \$\endgroup\$Landei– Landei2011年04月07日 12:57:43 +00:00Commented Apr 7, 2011 at 12:57 -
\$\begingroup\$ There is a problem with the answer. It is possible for the strings to not be of equal lengths but can still be considered as anagrams e.g. An anagram for "anagram" is "nag a ram". Thus, it would be wise to replace all special characters, spaces, and whatnot in each string before turning them into char Arrays. \$\endgroup\$Rocket Pingu– Rocket Pingu2017年10月13日 04:35:14 +00:00Commented Oct 13, 2017 at 4:35
-
1\$\begingroup\$ @Rocket Pingu It depends on how you define your requirements. Nothing stops you from using my code and preprocess the input strings. On the other hand, if I include your suggestions, the code wouldn't be usable for stricter definitions of anagrams. \$\endgroup\$Landei– Landei2018年02月13日 12:27:48 +00:00Commented Feb 13, 2018 at 12:27
This is essentially asking you to compare if two sets are equivalent, thinking of the strings as a set of characters where the order doesn't matter.
For an O(n) runtime algorithm, you could iterate through the first string, and count the number of instances of each letter. Then iterate over the second string and do the same. Afterwards, make sure the counts match.
To save a little bit of storage, use the same array for both counts; while iterating the first string, increment the count once for each letter. When iterating the second, decrement. Afterwards, make sure each letter count is zero.
When running this algorithm over generic sets of objects, one might store the counts in a dictionary keyed off the object's hashcode. However, because we're using a relatively small alphabet, a simple array would do. Either way, storage cost is O(1).
Cosider the following pseudo-code:
function are_anagrams(string1, string2)
let counts = new int[26];
for each char c in lower_case(string1)
counts[(int)c]++
for each char c in lower_case(string2)
counts[(int)c]--
for each int count in counts
if count != 0
return false
return true
how can I make this faster?
Perform a preliminary check if both strings have equal lengths. But that doesn't change the Big-O complexity of the algorithm though.
Your algorithm is O(a * b) * (a+b)
, where a
and b
are the lengths of your input strings. The last (a+b)
are the two deleteCharAt(...)
operations, making it in all a O(n^3) algorithm. You could bring that down to O(n)
(linear) by creating a frequency map of your strings, and comparing those maps.
A demo:
import java.util.HashMap;
import java.util.Map;
public class Main {
public static boolean anagram(String s, String t) {
// Strings of unequal lengths can't be anagrams
if(s.length() != t.length()) {
return false;
}
// They're anagrams if both produce the same 'frequency map'
return frequencyMap(s).equals(frequencyMap(t));
}
// For example, returns `{b=3, c=1, a=2}` for the string "aabcbb"
private static Map<Character, Integer> frequencyMap(String str) {
Map<Character, Integer> map = new HashMap<Character, Integer>();
for(char c : str.toLowerCase().toCharArray()) {
Integer frequency = map.get(c);
map.put(c, frequency == null ? 1 : frequency+1);
}
return map;
}
public static void main(String[] args) {
String s = "Mary";
String t = "Army";
System.out.println(anagram(s, t));
System.out.println(anagram("Aarmy", t));
}
}
which prints:
true
false
-
\$\begingroup\$ PS. silly me, I didn't notice Scott already suggested the exact same thing... I'll leave my answer though, since it contains some code that can be tested without making any modifications. \$\endgroup\$user3008– user30082011年04月07日 07:47:32 +00:00Commented Apr 7, 2011 at 7:47
-
1\$\begingroup\$
deleteCharAt
is O(n) ... I believe the worst-case runtime of OP’s algorithm is O(n^3), not O(n^2). \$\endgroup\$Konrad Rudolph– Konrad Rudolph2011年04月07日 10:50:17 +00:00Commented Apr 7, 2011 at 10:50 -
\$\begingroup\$ Konrad, didn't even notice the
deleteCharAt(...)
! :) You're right:deleteCharAt(...)
is another n-operations in the 2nd for-statement, making itO(n^3)
. \$\endgroup\$user3008– user30082011年04月07日 11:01:17 +00:00Commented Apr 7, 2011 at 11:01 -
\$\begingroup\$ The frequency map is a cool idea! I wasn't sure how to increment the values in the hashmap but it's clear now, thanks =)! \$\endgroup\$sova– sova2011年04月07日 12:53:00 +00:00Commented Apr 7, 2011 at 12:53
-
\$\begingroup\$ "Strings of unequal lengths can't be anagrams" They can be anagrams! Definition of anagrams here. The check for lengths is unnecessary. @BartKiers \$\endgroup\$Rocket Pingu– Rocket Pingu2017年10月13日 04:27:56 +00:00Commented Oct 13, 2017 at 4:27
I'm going to answer this a bit more meta:
Seeing this question I ask myself: What does the interviewer want to know from me?
Does he expect
- a low-level, highly optimized algorithm (as you are attempting)
- or does he want to see, that I can apply the standard Java API to the problem (as Landei suggests)
- or maybe something in between, like the optimized implementation and use of a multi-set/counted set/bag (as Bart and Scott do)
Possibly the interviewer just expects you to discuss exactly that with him, so that he sees that you know that these alternatives exist and which advantages and disadvantages they have.
Using Guava's MultiSet, you can get some very readable and concise code:
public static boolean areAnagrams(String s1, String s2) {
Multiset<Character> word1 = HashMultiset.create(Lists.charactersOf(s1));
Multiset<Character> word2 = HashMultiset.create(Lists.charactersOf(s2));
return word1.equals(word2);
}
deleteCharAt(index)
needs to shuffle the characters after index across, so you might be better off iterating over the text starting from the end.
If you are comparing many strings against each other, I would probably start by creating (and caching) copies of the strings but with their characters sorted. This would allow you to identify anagrams with a standard string comparison. As an added benefit, you can use the string with sorted characters as a hash table key making it quick and easy to find anagrams in a long list.
// in Java this would be an unusual signature
public boolean areAnagrams(StringBuffer s1b, StringBuffer s2b) {
for (int i=0; i<s1b.length(); ++i) {
for (int j=0; j<s2b.length(); ++j) {
// what could you do if the condition evaluates fo false here?
if (s1b.charAt(i) == s2b.charAt(j)) {
// ouch. ask yourself what might be happening inside this JDK method
// best case it bumps an offset, worst case it reallocs the backing array?
s1b.deleteCharAt(i);
s2b.deleteCharAt(j);
// double ouch. one failure mode for fiddling with the loop var would be infinite loop
// avoid this
i=0;
j=0;
}
}
}
// if you have reached this line of code, what else do you know about s1b and s2b?
if (s1b.equals(s2b)) {
return true;
} else
return false;
}
-
\$\begingroup\$ what would you write as the signature instead? It's not obvious to me that it's a weird Java signature \$\endgroup\$sova– sova2011年04月14日 13:58:11 +00:00Commented Apr 14, 2011 at 13:58
-
\$\begingroup\$ It would be better to accept two Strings or CharSequences. My main beef with StringBuffer/StringBuilder here is they are mutable and this function should not alter the inputs \$\endgroup\$Ron– Ron2011年04月14日 14:11:02 +00:00Commented Apr 14, 2011 at 14:11
-
\$\begingroup\$ That's a reasonable guideline. By mutable do you mean that if I edit s1b or s2b in this method body then the original values (passed to this function) are changed? \$\endgroup\$sova– sova2011年04月14日 15:53:42 +00:00Commented Apr 14, 2011 at 15:53
Assume that we want to check Str1 and Str2 if they are anagrams or not
public boolean checkAnagram(String s1 , String s2 ) {
int i=0;
int j=0;
// no need to check if that j < s2.length()
// because the two strings must be that the same length.
while(i < s1.length()) {
if(s1.indexOf(s2.charAt(j) < 0 || s2.indexOf(s1.charAt(i)) < 0 )
return false;
i++;
j++;
} // end of While
return true;
}// end of Check Anagram Method. the Big-Oh is O(n log n)
-
2\$\begingroup\$ Something looks wrong here. Checking only if characters are contained in the other string will make "aab" anagram of "abb". Also i and j will always be equals with this code so maybe something is missing. \$\endgroup\$Damien– Damien2012年02月01日 06:17:37 +00:00Commented Feb 1, 2012 at 6:17
I like the guava based response but here is a histogram based answer for completion sake. It builds a single histogram to capture differences in two words for the given character set (in this case, assumes 8-bit) and uses the histogram later to identify whether two words are Anagrams.
public boolean areAnagrams(String word1, String word2) {
int[] counts = new int[256]; /*assuming 8-bit character set */
if(word1.length() != word2.length())
return false;
int inputLength = word1.length();
for(int index=0; (int)word1.charAt(index) > 0 && (int)word2.charAt(index) > 0; index++ ) {
counts[(int)word1.charAt(index)] ++;
counts[(int)word2.charAt(index)] --;
}
for(int index=0; index < inputLength; index++ )
if(counts[index] > 0)
return false;
return true;
}
I think this works with a complexity O(2n).
This can be done with an array level operations.
The solution uses a buffer array of size 256 (assuming string contains chars with ASCII value less than 256), ASCII value of each char is used as array index.
First loop: increments the array position of each characters.
Second loop: Check any of the positions are zero, that means the character is not present in the first string or it presents more number of times than the first one.
public static boolean isAnagram(String str1, String str2){
if(str1.length() != str2.length()){ return false;}
int[] buffer = new int[256];
for(char ch : str1.toCharArray()){
buffer[ch]++;
}
for(char ch : str2.toCharArray()){
if(buffer[ch]==0) return false;
buffer[ch] = (buffer[ch] > 0)?(buffer[ch] - 1 ): -1 ;
}
return true;
}
-
\$\begingroup\$ Welcome to Code Review! You have presented an alternative solution, but haven't reviewed the code. Please explain your reasoning (how your solution works and how it improves upon the original) so that the author can learn from your thought process. \$\endgroup\$SuperBiasedMan– SuperBiasedMan2016年01月22日 09:49:01 +00:00Commented Jan 22, 2016 at 9:49
-
\$\begingroup\$ Added a description, please see \$\endgroup\$Vishnu Vasudevan– Vishnu Vasudevan2016年01月22日 10:06:59 +00:00Commented Jan 22, 2016 at 10:06
StringBuilder
because it doesn't use synchronization here. It will be faster and you aren't using multiple threads to access the same buffer, makingStringBuffer
slower with no payoff. As for your code, no need to reseti
since you don't want to start over, and ifi
gets to the end of the first string they aren't anagrams. \$\endgroup\$