Function which takes a string as input and returns the most frequent character

Question 1

I wrote a feature. And it works correctly. But in my opinion it is not optimal. But I have not found what standard features you can use to unleash it. Since I made it a simple sort of values. And it takes a long time to execute.

char getMaxOccuringChar(char* str)
{
 int count[ASCII_SIZE] = {0};
 int len = strlen(str);
 int max = 0;
 char result;
 for (int i = 0; i < len; i++) {
 count[str[i]]++;
 if (max < count[str[i]]) {
 max = count[str[i]];
 result = str[i];
 }
 }
 return result;
}

Question 2

Well, you could keep temporary values of x = str[i] and y = count[x], in order to reduce the number of memory-access operations.

Question 3

If your string is huge, it's probably more efficient to loop through count once to determine the max element at the end instead of doing the if (max < count[str[i]]) { for every character in the string.

Question 4

count[str[i]]++; are you sure this is what you want? and not str[i] - 'a' or -'A' ? to map it correctly to an index within your count array?

Question 5

BTW, you may as well tag this question C, as there is no C++ syntax involved here.

Question 6

What if len equals zero? Why char* iso std::string_view (or const char * pre 17)? What makes you think this ain't optimal?

Question 7

Here’s how I’d write this using the standard library:

#include <algorithm>
#include <unordered_map>
#include <string_view>
char most_frequent_char(std::string_view str) {
 std::unordered_map<char, int> counts;
 for (auto c : str) counts[c] += 1;
 return std::max_element(
 begin(counts), end(counts),
 [](auto a, auto b) {return a.second < b.second;}
 )->first;
}

But to be honest I’m not happy with manually iterating over the string to counts its characters. In actual code I’d probably abstract away the creation of a frequency table, which would presumably also have a max function. In Python this directly corresponds to the collections.Counter class. If we assume the existence of this utility class (left as an exercise to the reader), the implementation becomes

char most_frequent_char(std::string_view str) {
 return max(freq_table{str}).first;
}

Incidentally, in statistics this property is known as the "mode" of a distribution.

Question 8

I'd suggest adding the explicit mentioning of that you use a lambda expression? Assuming the guy who wrote the question is a beginner, this might not be clear to that person, or the concept might be alien altogether. With the term being explicitly named, one has something to search for.

Question 9

@Aziuth I'm a girl. I know what lambda is))

Question 10

We don't modify the passed string, so we should accept a const char*.

ASCII_SIZE isn't a standard identifier. A better size for the storage would be UCHAR_MAX+1 (found in the <climits> header).

The counting elements can be unsigned - I recommend std::size_t, as that's the maximum length of a string.

Any character greater than ASCII_SIZE or less than zero will index out of range, which is undefined behaviour (a bug).

We don't need to call std::strlen() to measure the string before we start. We can just advance the index (or pointer) until we reach the terminating NUL character.

There's a bug if an empty string is passed as argument: in that case, result is never assigned, so we return an indeterminate result.

Alternative version

Keeping the existing logic, but fixing the above issues and using a more modern std::array instead of a raw (C style) array:

#include <array>
#include <climits>
#include <cstdio> // for EOF
/*
 * Returns the most frequent character, represented as unsigned char.
 * If the string is empty, returns EOF
 */
int getMaxOccuringChar(char const* s)
{
 std::array<std::size_t, UCHAR_MAX+1> count = {};
 std::size_t max = 0;
 int result = EOF;
 while (*s) {
 auto const c = static_cast<unsigned char>(*s++);
 auto current = ++count[c];
 if (max < current) {
 max = current;
 result = c;
 }
 }
 return result;
}

A simple test:

#include <iostream>
int main()
{
 std::cout << static_cast<char>(getMaxOccuringChar("QQMMMX")) << '\n';
}

If the input is likely to be more than UCHAR_MAX characters, then it may be better to omit updating max as we update count, and instead find the maximum count after the loop (using std::max_element(), from <algorithm>).

Multiset version

We could instead use a multiset to count each character (in its constructor), and then use std::max_element() to find the mode, like this:

#include <algorithm>
#include <string_view>
#include <unordered_set>
/*
 * Returns the most frequent character, represented as unsigned char.
 * If the string is empty, returns the null character
 */
int getMaxOccuringChar(std::string_view s)
{
 auto const count = std::unordered_multiset<char>{s.begin(), s.end()};
 if (count.empty()) {
 return 0;
 }
 auto const comparator =
 [count](auto a, auto b){ return count.count(a) < count.count(b); };
 return *std::max_element(count.begin(), count.end(), comparator);
}

You might find that more satisfying (though performance may suffer through having to re-allocate as the set grows, which isn't a problem when we use an array).

Consider other character types

The second version can be made a template, to work with other character types, such as wchar_t:

template<typename Char>
Char getMaxOccuringChar(std::basic_string_view<Char> s)
{
 auto const count = std::unordered_multiset<Char>{s.begin(), s.end()};
 if (count.empty()) {
 return Char{};
 }
 auto const comparator =
 [count](auto a, auto b){ return count.count(a) < count.count(b); };
 return *std::max_element(count.begin(), count.end(), comparator);
}

Konrad Rudolph Konrad Rudolph 6,68823 silver badges35 bronze badges · Accepted Answer · 2019-10-23 10:43:54Z

Here’s how I’d write this using the standard library:

#include <algorithm>
#include <unordered_map>
#include <string_view>
char most_frequent_char(std::string_view str) {
 std::unordered_map<char, int> counts;
 for (auto c : str) counts[c] += 1;
 return std::max_element(
 begin(counts), end(counts),
 [](auto a, auto b) {return a.second < b.second;}
 )->first;
}

But to be honest I’m not happy with manually iterating over the string to counts its characters. In actual code I’d probably abstract away the creation of a frequency table, which would presumably also have a max function. In Python this directly corresponds to the collections.Counter class. If we assume the existence of this utility class (left as an exercise to the reader), the implementation becomes

char most_frequent_char(std::string_view str) {
 return max(freq_table{str}).first;
}

Incidentally, in statistics this property is known as the "mode" of a distribution.

I'd suggest adding the explicit mentioning of that you use a lambda expression? Assuming the guy who wrote the question is a beginner, this might not be clear to that person, or the concept might be alien altogether. With the term being explicitly named, one has something to search for.

Stack Exchange Network

Function which takes a string as input and returns the most frequent character

2 Answers 2

Alternative version

Multiset version

Consider other character types

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Function which takes a string as input and returns the most frequent character

2 Answers 2

Alternative version

Multiset version

Consider other character types

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions