Given two strings, a and b, determine the minimum number of character deletions required to make a and b anagrams

Question 1

I'm trying to figure out how my code could be improved. I solved a HackerRank problem from the Cracking the Coding Interview section, and am thinking that it should be able to be solved a simpler than what I did.

The problem is as follows:

Strings: Making Anagrams

Alice is taking a cryptography class and finding anagrams to be very useful. We consider two strings to be anagrams of each other if the first string's letters can be rearranged to form the second string. In other words, both strings must contain the same exact letters in the same exact frequency For example, bacdc and dcbac are anagrams, but bacdc and dcbad are not.

Alice decides on an encryption scheme involving two large strings where encryption is dependent on the minimum number of character deletions required to make the two strings anagrams. Can you help her find this number?

Given two strings, a and b, that may or may not be of the same length, determine the minimum number of character deletions required to make a and b anagrams. Any characters can be deleted from either of the strings.

Input Format

The first line contains a single string, a. The second line contains a single string, b.

Constraints

\1ドル \le |a|,|b| \le 10^4\$

It is guaranteed that a and b consist of lowercase English alphabetic letters (i.e., a through z).

Output Format

Pring a single integer denoting the number of characters you must delete to make the two strings anagrams of each other.

Sample Input
cde
abc
Sample Output
4
Explanation

We delete the following characters from our two strings to turn them into anagrams of each other:

Remove d and e from cde to get c.

Remove a and b from abc to get c.

We must delete 4 characters to make both strings anagrams, so we print 4 on a new line.

My Implementation

I am using C++, and since strings are immutable, what I decided to do was to create two int arrays (vectors, actually) that hold the ASCII value of the chars in each string. Then I would sort the vectors. Then I would iterate through the arrays together, counting the number of elements that don't exist in the other.

#include <string>
#include <vector>
#include <iostream>
#include <algorithm>
using namespace std;
int calcDeletions(vector<int> a, vector<int> b) {
 int deletions = 0;
 int ai = 0;
 int bi = 0;
 // step through arrays until the end of one is reached
 while (ai < a.size() && bi < b.size()) {
 if (a[ai] < b[bi]) {
 deletions++;
 ai++;
 }
 else if (a[ai] > b[bi]) {
 deletions++;
 bi++;
 }
 else {
 ai++;
 bi++;
 }
 }
 // carry over left-overs
 deletions += (a.size() - ai);
 deletions += (b.size() - bi);
 return deletions;
}
int number_needed(string a, string b) {
 vector<int> aInt;
 vector<int> bInt;
 // create two int vectors to store ascii val of strings
 for (int i = 0; i < a.length(); i++) {
 aInt.push_back((int)a.at(i));
 }
 for (int i = 0; i < b.length(); i++) {
 bInt.push_back((int)b.at(i));
 }
 // sort vectors
 sort(aInt.begin(), aInt.end());
 sort(bInt.begin(), bInt.end());
 // call calcDeletions function with the longer vector passed as first arg
 if (aInt.size() > bInt.size()) {
 return calcDeletions(aInt, bInt);
 }
 else {
 return calcDeletions(bInt, aInt);
 }
}
int main() {
 string a;
 cin >> a;
 string b;
 cin >> b;
 cout << number_needed(a, b) << endl;
 return 0;
}

What is a better, more efficient way of solving this problem?

Question 2

I'm trying to figure out how my code could be improved. I solved a HackerRank problem from the Cracking the Coding Interview section, and am thinking that it should be able to be solved a simpler than what I did.

Probably. But these hacker problems are usually very specific. You can probably google a perfect solution in 10 minutes.

What is a better, more efficient way of solving this problem?

Create a count of the number of each letter from both strings. To be anagrams these counts must be the same. So look across the counts and sum the differences.

I am using C++, and since strings are immutable,

No they are not. Strings are perfectly mutable.

what I decided to do was to create two int arrays (vectors, actually) that hold the ASCII value of the chars in each string. Then I would sort the vectors. Then I would iterate through the arrays together, counting the number of elements that don't exist in the other.

The complexity of this is (ignoring the copy to a vector)

O(n1.log(n1)) // Sort first string
O(n2.log(n2)) // Sort second string
O(max(n1,n2)) // Loop Over counting.

That seems to work. I like getting the count of each letter better. The difference in the counts is the number of letters to be deleted.

O(n1) // Count each character into an array.
O(n2) // Count each character into an array.
O(1) // count the diffs.
 // There is a loop over 255 characters (26 if you only count lower case letters).
 // This is constant and not dependant on the size of
 // the input and thus is `1`

Code Review.

Using Namespace

Please stop doing this.

using namespace std;

Its a bad habit that will cause you grief one day. See every other C++ code review for a good explanation.

Pass by const reference

You are passing the parameters by value. This will cause both the vectors to be copied.

int calcDeletions(vector<int> a, vector<int> b) {

Since you don't modify the values it can be made simpler by passing a const reference of each parameter. Thus having no copy.

int calcDeletions(vector<int> const& a, vector<int> const& b) {

Prefer Pre Increment

 deletions++;
 ai++;

Yes. Yes. It makes no difference for integers. But it does no harm to use pre-increment on integers either. But again it's one of those habits you should get into. There are situations where it does make a difference so it's best to pre-increment and then it will always be correct (even if you change the types of your objects).

Prefer `operator[]` over `at()`

The difference between the two is that at() performs bounds checking.

 for (int i = 0; i < a.length(); i++) {
 aInt.push_back((int)a.at(i));
 }

In this context you are guaranteed that there is no out of bounds access as you are already checking that i is smaller than a.length(). Thus you are effectively doing the check twice.

Don't use C style casts.

C style casts are hard to spot.

 (int)a.at(i)
 ^^^^^

They are also extremely dangerous (they basically tell the compiler to shut up and do what you're told as I the programmer am god and know better than you the simple compiler). In reality this is usually wrong: the compiler always knows better than you and telling it to shut up is usually hiding an error message.

As a result, C++ has its own casts. There are actually four of them:

static_cast<>()
const_cast<>()
reinterpret_cast<>()
dynamic_cast<>()

Your above cast is actually best done by a static_cast<>()

static_cast<int>(a.at(i));

(削除) BUT There is no actual need to cast a char to an integer. This will happen automatically (as a char is an integer type the conversion is automatic) with no loss in precision. (削除ここまで)

Is there a need to put the largest first.

I don't see you taking advantage of this in the function above!

 // call calcDeletions function with the longer vector passed as first arg
 if (aInt.size() > bInt.size()) {
 return calcDeletions(aInt, bInt);
 }
 else {
 return calcDeletions(bInt, aInt);
 }

Looking at the Cool Shark implementation you provided it still has a couple of issues.

Question 3

There is one time that a cast would be necessary for char->int: that's if both are the same size and plain char is unsigned (I doubt that's the case on OP's platform).

Question 4

@TobySpeight It is implementation defined if char is signed or unsigned.

Question 5

@TobySpeight: See n4800 Section: 6.7.1 Fundamental types [basic.fundamental] Paragraph 7:

Type char is a distinct type that has an implementation-defined choice of "signed char" or "unsigned char" as its underlying type.

Question 6

@TobySpeight: But note there is also a min size for int (which is 16 bits). Though char is defined as 1 byte it is possible that there are implementations that have 1 byte as 16 bits (ie CHAR_BITS == 16) stackoverflow.com/a/271132/14065

Question 7

But yes lets leave the cast in it will make things work.

Question 8

So there is already a lot mentioned by the other ones, but i would like to focus on your algorithm, which quite frankly is badly suited for that problem.

What you are doing is sorting the two strings and comparing every individual position afterwards. However, as the problem states, the only thing you need is the histogram of the words.

both strings must contain the same exact letters in the same exact frequency

So what you actually need to do is simply counting the occurrences of every letter without any sorting necessary. That is best served by a std::map<char, int>, where you insert the characters and increase their count.

std::map<char, int> createHistogram(const std::string &word)
 std::map<char, int> histogram;
 for (auto &character : word) {
 if (histogram.find(character) == histogram.end()) {
 histogram.insert(std::make_pair(character, 1);
 } else {
 histogram[character]++;
 }
 }
 return histogram;
}

Now that you have the two histograms you can walk through them and count the difference in the frequency of the characters.

size_t compareFrequencies(std::map<char, int> &hist1, std::map<char, int> &hist2) {
 size_t result = 0;
 for (auto it = hist1.begin(); it != hist1.end(); ++it) {
 if (hist2.find(it->first) == hist2.end()) {
 result += it->second;
 } else {
 result += std::abs(it->second - hist2[it-first]);
 hist2.erase(it-first);
 }
 }
 /* We know all remaining characters are unique to word2 */
 for (auto it = hist1.begin(); it != hist1.end(); ++it) {
 result += it->second;
 }
 return result;
}

Now you can combine these two and get

#include <iostream>
#include <map>
#include <cmath>
#include <string>
#include <utility>
std::map<char, int> createHistogram(const std::string &word)
 std::map<char, int> histogram;
 for (auto &character : word) {
 if (histogram.find(character) == histogram.end()) {
 histogram.insert(std::make_pair(character, 1);
 } else {
 histogram[character]++;
 }
 }
 return histogram;
} 
size_t compareFrequencies(std::map<char, int> &hist1, std::map<char, int> &hist2) {
 size_t result = 0;
 for (auto it = hist1.begin(); it != hist1.end(); ++it) {
 if (hist2.find(it->first) == hist2.end()) {
 result += it->second;
 } else {
 result += std::abs(it->second - hist2[it-first]);
 hist2.erase(it-first);
 }
 }
 /* We know all remaining characters are unique to word2 */
 for (auto it = hist1.begin(); it != hist1.end(); ++it) {
 result += it->second;
 }
 return result;
}
int main() {
 std::string word1, word2;
 std::cin >> word1 >> word2;
 std::map<char, int> hist1 = createHistogram(word1);
 std::map<char, int> hist2 = createHistogram(word2);
 std::cout << compareFrequencies(hist1, hist2);
}

Question 9

In the createHistogram() function, you don't need to check if the letter is in the map. The missing letter will be inserted. Just do histogram[character]++;.

Question 10

Absolutely, that said I am unsure whether it is really the right thing to edit it or keep it as it is

Question 11

The crux of the problem is that how many different characters (including their count) both sliced strings have, gives the operations need to be done from sice1 string to slice 2 for anagram creation.

For example the string 1 'abb' against string 2 'bbc'. For string 2 to be anagram of string 1, it needs 'a' character with frequency 1. Because condition of b's frequency 2 is already met.

We will create hashes of frequencies for both strings and if frequency goes to 0 or less then 0 between two hashes, it means string 1 can create anagram string 2 with it's characters successfully without operation for that character. if frequency is more then 0, that is an operation need to be done on string 2 to make it anagram of string 1 for that character.

At the end we sum all operations against each character and return.

function countAnagramOperations(s) {
 if (s.length % 2 != 0) return -1;
 const midIndex = s.length / 2;
 const s1 = s.substring(0, midIndex).split('');
 const s2 = s.substring(midIndex, s.length).split('');
 const m1 = {}; // here count the frequency of each character
 const m2 = {}; 
 s1.forEach(c => {
 m1[c] = m1[c] ? m1[c] + 1 : 1; // make frequencies map
 });
 s2.forEach(c => {
 m2[c] = m2[c] ? m2[c] + 1 : 1;
 });
 Object.keys(m1).forEach(key => {
 // find how many characters are different in both hashes,
 if (m2[key]) {
 m1[key] = m1[key] - m2[key]; 
 if (m1[key] <= 0) {
 delete m1[key];
 }
 }
 });
 // return total sum of operations against each character
 return Object.values(m1).reduce((a, b) => a + b, 0);
}
console.log(countAnagramOperations('abccde')); // 2
console.log(countAnagramOperations('ab')); // 1
console.log(countAnagramOperations('abbbbc')); // 1

Question 12

This is an interesting answer, but the code doesn't appear to be C++ like the code in the question. Please be advised that answers should use the same language as the question.

Question 13

String wo1 = "fowl", wo2 = "owl", wo3 = "howl", wo4 = "low";

 HashMap<String, Integer> wo1Map = new HashMap<String, Integer>();
 HashMap<String, Integer> wo2Map = new HashMap<String, Integer>();
 HashMap<String, Integer> wo3Map = new HashMap<String, Integer>();
 HashMap<String, Integer> wo4Map = new HashMap<String, Integer>();
 wo1Map = convertToHashMap(wo1.toLowerCase());
 wo2Map = convertToHashMap(wo2.toLowerCase());
 wo3Map = convertToHashMap(wo3.toLowerCase());
 wo4Map = convertToHashMap(wo4.toLowerCase());
 HashSet<String> unionKeys = new HashSet<>(wo4Map.keySet());
 unionKeys.addAll(wo1Map.keySet());
 unionKeys.removeAll(wo4Map.keySet());
 System.out.println("remove letter"+unionKeys);
 HashSet<String> intersectionKeys = new HashSet<>();
 for(String i : wo1Map.keySet()) {
 for(String j : wo3Map.keySet()) {
 if( i.equals(j) )
 intersectionKeys.add(i);
 }
 }
 System.out.println("common letters "+intersectionKeys);

Question 14

Welcome to Code Review! You have presented an alternative solution, but haven't reviewed the code. Please edit to show what aspects of the question code prompted you to write this version, and in what ways it's an improvement over the original. It may be worth (re-)reading How to Answer.

Loki Astari Loki Astari 97.6k5 gold badges126 silver badges341 bronze badges · Accepted Answer · 2016-10-16 13:14:19Z

I'm trying to figure out how my code could be improved. I solved a HackerRank problem from the Cracking the Coding Interview section, and am thinking that it should be able to be solved a simpler than what I did.

Probably. But these hacker problems are usually very specific. You can probably google a perfect solution in 10 minutes.

What is a better, more efficient way of solving this problem?

Create a count of the number of each letter from both strings. To be anagrams these counts must be the same. So look across the counts and sum the differences.

I am using C++, and since strings are immutable,

No they are not. Strings are perfectly mutable.

what I decided to do was to create two int arrays (vectors, actually) that hold the ASCII value of the chars in each string. Then I would sort the vectors. Then I would iterate through the arrays together, counting the number of elements that don't exist in the other.

The complexity of this is (ignoring the copy to a vector)

O(n1.log(n1)) // Sort first string
O(n2.log(n2)) // Sort second string
O(max(n1,n2)) // Loop Over counting.

That seems to work. I like getting the count of each letter better. The difference in the counts is the number of letters to be deleted.

O(n1) // Count each character into an array.
O(n2) // Count each character into an array.
O(1) // count the diffs.
 // There is a loop over 255 characters (26 if you only count lower case letters).
 // This is constant and not dependant on the size of
 // the input and thus is `1`

Code Review.

Using Namespace

Please stop doing this.

using namespace std;

Its a bad habit that will cause you grief one day. See every other C++ code review for a good explanation.

Pass by const reference

You are passing the parameters by value. This will cause both the vectors to be copied.

int calcDeletions(vector<int> a, vector<int> b) {

Since you don't modify the values it can be made simpler by passing a const reference of each parameter. Thus having no copy.

int calcDeletions(vector<int> const& a, vector<int> const& b) {

Prefer Pre Increment

 deletions++;
 ai++;

Yes. Yes. It makes no difference for integers. But it does no harm to use pre-increment on integers either. But again it's one of those habits you should get into. There are situations where it does make a difference so it's best to pre-increment and then it will always be correct (even if you change the types of your objects).

Prefer `operator[]` over `at()`

The difference between the two is that at() performs bounds checking.

 for (int i = 0; i < a.length(); i++) {
 aInt.push_back((int)a.at(i));
 }

In this context you are guaranteed that there is no out of bounds access as you are already checking that i is smaller than a.length(). Thus you are effectively doing the check twice.

Don't use C style casts.

C style casts are hard to spot.

 (int)a.at(i)
 ^^^^^

They are also extremely dangerous (they basically tell the compiler to shut up and do what you're told as I the programmer am god and know better than you the simple compiler). In reality this is usually wrong: the compiler always knows better than you and telling it to shut up is usually hiding an error message.

As a result, C++ has its own casts. There are actually four of them:

static_cast<>()
const_cast<>()
reinterpret_cast<>()
dynamic_cast<>()

Your above cast is actually best done by a static_cast<>()

static_cast<int>(a.at(i));

(削除) BUT There is no actual need to cast a char to an integer. This will happen automatically (as a char is an integer type the conversion is automatic) with no loss in precision. (削除ここまで)

Is there a need to put the largest first.

I don't see you taking advantage of this in the function above!

 // call calcDeletions function with the longer vector passed as first arg
 if (aInt.size() > bInt.size()) {
 return calcDeletions(aInt, bInt);
 }
 else {
 return calcDeletions(bInt, aInt);
 }

Looking at the Cool Shark implementation you provided it still has a couple of issues.

There is one time that a cast would be necessary for char->int: that's if both are the same size and plain char is unsigned (I doubt that's the case on OP's platform).
@TobySpeight It is implementation defined if char is signed or unsigned.
@TobySpeight: See n4800 Section: 6.7.1 Fundamental types [basic.fundamental] Paragraph 7: Type char is a distinct type that has an implementation-defined choice of "signed char" or "unsigned char" as its underlying type.
@TobySpeight: But note there is also a min size for int (which is 16 bits). Though char is defined as 1 byte it is possible that there are implementations that have 1 byte as 16 bits (ie CHAR_BITS == 16) stackoverflow.com/a/271132/14065

Stack Exchange Network

Given two strings, a and b, determine the minimum number of character deletions required to make a and b anagrams

Strings: Making Anagrams

Input Format

Constraints

Output Format

Sample Input

Sample Output

Explanation

My Implementation

4 Answers 4

Code Review.

Using Namespace

Pass by const reference

Prefer Pre Increment

Prefer `operator[]` over `at()`

Don't use C style casts.

Is there a need to put the largest first.

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Given two strings, a and b, determine the minimum number of character deletions required to make a and b anagrams

Strings: Making Anagrams

Input Format

Constraints

Output Format

Sample Input

Sample Output

Explanation

My Implementation

4 Answers 4

Code Review.

Using Namespace

Pass by const reference

Prefer Pre Increment

Prefer operator[] over at()

Don't use C style casts.

Is there a need to put the largest first.

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Prefer `operator[]` over `at()`