I wrote a feature. And it works correctly. But in my opinion it is not optimal. But I have not found what standard features you can use to unleash it. Since I made it a simple sort of values. And it takes a long time to execute.
char getMaxOccuringChar(char* str)
{
int count[ASCII_SIZE] = {0};
int len = strlen(str);
int max = 0;
char result;
for (int i = 0; i < len; i++) {
count[str[i]]++;
if (max < count[str[i]]) {
max = count[str[i]];
result = str[i];
}
}
return result;
}
2 Answers 2
Here’s how I’d write this using the standard library:
#include <algorithm>
#include <unordered_map>
#include <string_view>
char most_frequent_char(std::string_view str) {
std::unordered_map<char, int> counts;
for (auto c : str) counts[c] += 1;
return std::max_element(
begin(counts), end(counts),
[](auto a, auto b) {return a.second < b.second;}
)->first;
}
But to be honest I’m not happy with manually iterating over the string to counts its characters. In actual code I’d probably abstract away the creation of a frequency table, which would presumably also have a max
function. In Python this directly corresponds to the collections.Counter
class. If we assume the existence of this utility class (left as an exercise to the reader), the implementation becomes
char most_frequent_char(std::string_view str) {
return max(freq_table{str}).first;
}
Incidentally, in statistics this property is known as the "mode" of a distribution.
-
1\$\begingroup\$ I'd suggest adding the explicit mentioning of that you use a lambda expression? Assuming the guy who wrote the question is a beginner, this might not be clear to that person, or the concept might be alien altogether. With the term being explicitly named, one has something to search for. \$\endgroup\$Aziuth– Aziuth2019年10月23日 10:56:38 +00:00Commented Oct 23, 2019 at 10:56
-
2\$\begingroup\$ @Aziuth I'm a girl. I know what lambda is)) \$\endgroup\$oksana vovl– oksana vovl2019年10月23日 11:08:20 +00:00Commented Oct 23, 2019 at 11:08
We don't modify the passed string, so we should accept a const char*
.
ASCII_SIZE
isn't a standard identifier. A better size for the storage would be UCHAR_MAX+1
(found in the <climits>
header).
The counting elements can be unsigned - I recommend std::size_t
, as that's the maximum length of a string.
Any character greater than ASCII_SIZE
or less than zero will index out of range, which is undefined behaviour (a bug).
We don't need to call std::strlen()
to measure the string before we start. We can just advance the index (or pointer) until we reach the terminating NUL character.
There's a bug if an empty string is passed as argument: in that case, result
is never assigned, so we return an indeterminate result.
Alternative version
Keeping the existing logic, but fixing the above issues and using a more modern std::array
instead of a raw (C style) array:
#include <array>
#include <climits>
#include <cstdio> // for EOF
/*
* Returns the most frequent character, represented as unsigned char.
* If the string is empty, returns EOF
*/
int getMaxOccuringChar(char const* s)
{
std::array<std::size_t, UCHAR_MAX+1> count = {};
std::size_t max = 0;
int result = EOF;
while (*s) {
auto const c = static_cast<unsigned char>(*s++);
auto current = ++count[c];
if (max < current) {
max = current;
result = c;
}
}
return result;
}
A simple test:
#include <iostream>
int main()
{
std::cout << static_cast<char>(getMaxOccuringChar("QQMMMX")) << '\n';
}
If the input is likely to be more than UCHAR_MAX
characters, then it may be better to omit updating max
as we update count
, and instead find the maximum count after the loop (using std::max_element()
, from <algorithm>
).
Multiset version
We could instead use a multiset to count each character (in its constructor), and then use std::max_element()
to find the mode, like this:
#include <algorithm>
#include <string_view>
#include <unordered_set>
/*
* Returns the most frequent character, represented as unsigned char.
* If the string is empty, returns the null character
*/
int getMaxOccuringChar(std::string_view s)
{
auto const count = std::unordered_multiset<char>{s.begin(), s.end()};
if (count.empty()) {
return 0;
}
auto const comparator =
[count](auto a, auto b){ return count.count(a) < count.count(b); };
return *std::max_element(count.begin(), count.end(), comparator);
}
You might find that more satisfying (though performance may suffer through having to re-allocate as the set grows, which isn't a problem when we use an array).
Consider other character types
The second version can be made a template, to work with other character types, such as wchar_t
:
template<typename Char>
Char getMaxOccuringChar(std::basic_string_view<Char> s)
{
auto const count = std::unordered_multiset<Char>{s.begin(), s.end()};
if (count.empty()) {
return Char{};
}
auto const comparator =
[count](auto a, auto b){ return count.count(a) < count.count(b); };
return *std::max_element(count.begin(), count.end(), comparator);
}
x = str[i]
andy = count[x]
, in order to reduce the number of memory-access operations. \$\endgroup\$count
once to determine the max element at the end instead of doing theif (max < count[str[i]]) {
for every character in the string. \$\endgroup\$count[str[i]]++;
are you sure this is what you want? and not str[i] - 'a' or -'A' ? to map it correctly to an index within your count array? \$\endgroup\$C
, as there is noC++
syntax involved here. \$\endgroup\$char*
isostd::string_view
(orconst char *
pre 17)? What makes you think this ain't optimal? \$\endgroup\$