Group Anagrams Together approach in Java

Question 1

Given an array of words, print the count of all anagrams together in sorted order (increasing order of counts). For example, if the given array is {"cat", "dog", "tac", "god", "act"}, then grouped anagrams are "(dog, god) (cat, tac, act)". So the output will be 2 3.

Input:

First line consists of T test case. First line of every test case consists of N, denoting the number of words in array. Second line of every test case consists of words of array.

Output:

Single line output, print the counts of anagrams in increasing order.

Constraints:

1<=T<=100 1<=N<=50

Example:

Input:

2

5

act cat tac god dog

3 act cat tac

Output:

2 3

3

My approach:

/*package whatever //do not write package name here */
import java.util.Scanner;
import java.io.IOException;
import java.util.HashMap;
import java.util.Set;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Collections;
import java.lang.StringBuffer;
class GFG {
 private static void getAnagramCt (String[] words) {
 HashMap <String, Integer> anag = new HashMap<>();
 List<Integer> counts = new ArrayList<> ();
 Set<String> keys;
 String[] keyArr;
 anag.put(words[0],0);
 keys = anag.keySet();
 keyArr = keys.toArray(new String[keys.size()]);
 for (int i = 1; i < words.length; i++) {
 String elem = words[i];
 for (int j = 0; j < keyArr.length; j++) { 
 if (isAnagram(elem,keyArr[j])) {
 int ct = anag.get(keyArr[j]);
 anag.put(keyArr[j],ct+1);
 }
 else {
 if (j == keyArr.length - 1) {
 anag.put(elem, 0);
 }
 else {
 continue; 
 }
 }
 }
 keys = anag.keySet();
 keyArr = keys.toArray(new String[keys.size()]);
 }
 for (Map.Entry<String,Integer> entry : anag.entrySet()) {
 //System.out.println(entry.getKey() + " " + entry.getValue());
 counts.add(entry.getValue() + 1);
 }
 Collections.sort(counts);
 for (Integer elem : counts) {
 System.out.print(elem + " ");
 }
 System.out.println();
 }
 private static boolean isAnagram (String word2, String word1) {
 if (word1.length() != word2.length()) {
 return false;
 }
 StringBuffer word = new StringBuffer(word2);
 for (int i = 0; i < word1.length(); i++) {
 if (word.length() == 0) {
 return false;
 }
 else if (word.indexOf((String.valueOf(word1.charAt(i)))) == -1) {
 return false;
 } 
 else {
 int ind = word.indexOf((String.valueOf(word1.charAt(i))));
 word.deleteCharAt(ind);
 }
 }
 return true;
 }
 public static void main (String[] args) throws IOException {
 Scanner sc = new Scanner(System.in);
 int numTests = sc.nextInt();
 for (int i = 0; i < numTests; i++) {
 int numWords = sc.nextInt();
 String[] words = new String[numWords];
 for (int j = 0; j < numWords; j++) {
 words[j] = sc.next();
 }
 getAnagramCt(words);
 }
 }
}

I have the following questions with regards to the above code:

How can I further improve my approach?
Is there a better way to solve this question?
Are there any grave code violations that I have committed?
Can space and time complexity be further improved?

Reference

Question 2

You shouldn't use isAnagram() function, instead sorted characters: "god"->"dgo", "dog"->"dgo" and count that. See: codereview.stackexchange.com/questions/119904/…

Question 3

Thanks, this was the similar to the approach mentioned in the Editorial

Question 4

 for (int i = 0; i < word1.length(); i++) {

Since you never use i except to calculate word1.charAt(i), you could more easily do

 for (char current : word1.toCharArray()) {

Then you could replace word1.charAt(i) with current.

But as noted in a comment, it would actually be more efficient to use a different algorithm. The issue is that indexOf is \$\mathcal{O}(n)\$ and of course the for loop is \$\mathcal{O}(n)\$. Nested, they become \$\mathcal{O}(n^2)\$. Worse, isAnagram is itself inside nested for loops. So the overall approach is \$\mathcal{O}(m^2n^2)\,ドル where \$m\$ is the number of words and \$n\$ is the maximum length of each word.

If you instead normalize each string by sorting it (\$\mathcal{O}(n\log n)\$), you can then put the normalized strings in a HashMap of ArrayList (\$\mathcal{O}(1)\$ each or \$\mathcal{O}(m)\$ total). This is only \$\mathcal{O}(m\cdot n\log{n})\,ドル which is much better.

Or skip the ArrayList and just use a counter. The overall algorithm would look something like

 public static Map<String, Integer> countAnagrams(String[] words) {
 Map<String, Integer> anagramCounts = new HashMap<>();
 for (String word : words) {
 String sorted = sort(word);
 Integer count = anagramCounts.get(sorted);
 if (count == null) {
 count = 0;
 }
 anagramCounts.put(sorted, count + 1);
 }
 return anagramCounts;
 }

This expects the caller to handle the output, which is better practice (Single Responsibility Principle, as applied to methods; see also SOLID, Separation of concerns, and Don't Repeat Yourself).

The basic idea here is that the method should either generate data or display it. That way, if you want to reuse the data generation for something else, you can.

In the original, you generate the data and then display it in the same method. If you wanted to do something else with the data, you'd have to refactor. This is more flexible and easier to test with automation.

Consider

public static void displayAnagramCounts(String[] words) {
 Map<String, List<String>> groups = groupAnagrams(words);
 List<Integer> counts = countValues(groups);
 Collections.sort(counts);
 displayNumbers(counts);
}

Now countValues and displayNumbers can be written in generic ways. The former takes a Map of List and flattens it into a List of the sizes of the List elements. The latter displays a list of numbers with spaces between them. Both those methods can be reused and tested separately.

 for (Map.Entry<String,Integer> entry : anag.entrySet()) {
 //System.out.println(entry.getKey() + " " + entry.getValue());
 counts.add(entry.getValue() + 1);
 }

There is an argument that you shouldn't include commented out code in a code review. Commented out code is code not ready for source control (it should either be deleted or commented in).

You don't actually need an entrySet here, as values is enough. E.g.

 for (int value : anag.values()) {
 counts.add(value + 1);
 }

With my alternative algorithm, you wouldn't even have to add one here.

Question 5

Thanks, @mdfst13 for giving out your suggestions. Out of curiosity, what is the Single Responsibility Principle?

Question 6

Thanks a lot for providing such valuable information. As someone still at college, these principles were unbeknownst to me. Is a fresher expected to have knowledge of these principles or these are some things that we can pick up in the industry?

Question 7

These are concepts that I would expect most college graduates to have. They may come up in junior/senior classes though, e.g. software engineering.

Question 8

thanks, sir. I will try to revise them before I enter the industry.

mdfst13 mdfst13 22.4k6 gold badges34 silver badges70 bronze badges · Answer 1 · 2018-07-28 04:15:02Z

 for (int i = 0; i < word1.length(); i++) {

Since you never use i except to calculate word1.charAt(i), you could more easily do

 for (char current : word1.toCharArray()) {

Then you could replace word1.charAt(i) with current.

But as noted in a comment, it would actually be more efficient to use a different algorithm. The issue is that indexOf is \$\mathcal{O}(n)\$ and of course the for loop is \$\mathcal{O}(n)\$. Nested, they become \$\mathcal{O}(n^2)\$. Worse, isAnagram is itself inside nested for loops. So the overall approach is \$\mathcal{O}(m^2n^2)\,ドル where \$m\$ is the number of words and \$n\$ is the maximum length of each word.

If you instead normalize each string by sorting it (\$\mathcal{O}(n\log n)\$), you can then put the normalized strings in a HashMap of ArrayList (\$\mathcal{O}(1)\$ each or \$\mathcal{O}(m)\$ total). This is only \$\mathcal{O}(m\cdot n\log{n})\,ドル which is much better.

Or skip the ArrayList and just use a counter. The overall algorithm would look something like

 public static Map<String, Integer> countAnagrams(String[] words) {
 Map<String, Integer> anagramCounts = new HashMap<>();
 for (String word : words) {
 String sorted = sort(word);
 Integer count = anagramCounts.get(sorted);
 if (count == null) {
 count = 0;
 }
 anagramCounts.put(sorted, count + 1);
 }
 return anagramCounts;
 }

This expects the caller to handle the output, which is better practice (Single Responsibility Principle, as applied to methods; see also SOLID, Separation of concerns, and Don't Repeat Yourself).

The basic idea here is that the method should either generate data or display it. That way, if you want to reuse the data generation for something else, you can.

In the original, you generate the data and then display it in the same method. If you wanted to do something else with the data, you'd have to refactor. This is more flexible and easier to test with automation.

Consider

public static void displayAnagramCounts(String[] words) {
 Map<String, List<String>> groups = groupAnagrams(words);
 List<Integer> counts = countValues(groups);
 Collections.sort(counts);
 displayNumbers(counts);
}

Now countValues and displayNumbers can be written in generic ways. The former takes a Map of List and flattens it into a List of the sizes of the List elements. The latter displays a list of numbers with spaces between them. Both those methods can be reused and tested separately.

 for (Map.Entry<String,Integer> entry : anag.entrySet()) {
 //System.out.println(entry.getKey() + " " + entry.getValue());
 counts.add(entry.getValue() + 1);
 }

There is an argument that you shouldn't include commented out code in a code review. Commented out code is code not ready for source control (it should either be deleted or commented in).

You don't actually need an entrySet here, as values is enough. E.g.

 for (int value : anag.values()) {
 counts.add(value + 1);
 }

With my alternative algorithm, you wouldn't even have to add one here.

Thanks, @mdfst13 for giving out your suggestions. Out of curiosity, what is the Single Responsibility Principle?
Thanks a lot for providing such valuable information. As someone still at college, these principles were unbeknownst to me. Is a fresher expected to have knowledge of these principles or these are some things that we can pick up in the industry?
These are concepts that I would expect most college graduates to have. They may come up in junior/senior classes though, e.g. software engineering.
thanks, sir. I will try to revise them before I enter the industry.

Stack Exchange Network

Group Anagrams Together approach in Java

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Group Anagrams Together approach in Java

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions