Specification:
Given
first, last
(which areForwardIterator
s, and whosestd::iterator_trais::value_type
isLessThanComparable
), find the most frequent element in the sequence and return pair of iterator to the last occurrence of the element and frequency count. When using overload withcomparator
(which isCompare
), the restriction on value type of iterators is lifted.
Usage guidelines:
Should be used when the elements in the sequence are non copyable or too expensive to copy. Should be avoided when entropy of the sequence is very high, most of the values in the sequence are distinct, size of value type of iterators is within range of a few integers and the sequence is very large.
Code:
#ifndef AREA51_ALGORITHM_HPP
#define AREA51_ALGORITHM_HPP
#include <utility>
#include <map>
#include <iterator>
#include <cstddef>
template <typename ForwardIterator, typename Comparator>
std::pair<ForwardIterator, std::size_t> most_frequent(ForwardIterator first,
ForwardIterator last, Comparator comparator)
{
auto comp = [&comparator](const auto& lhs, const auto& rhs)
{
return comparator(lhs.get(), rhs.get());
};
std::map<std::reference_wrapper<typename std::iterator_traits<ForwardIterator>::value_type>,
std::size_t, decltype(comp)> counts(comp);
std::size_t frequency = 0;
auto most_freq = first;
while (first != last)
{
std::size_t current = ++counts[*first];
if (current > frequency)
{
frequency = current;
most_freq = first;
}
++first;
}
return std::make_pair(most_freq, frequency);
}
template <typename ForwardIterator>
std::pair<ForwardIterator, std::size_t> most_frequent(ForwardIterator first, ForwardIterator last)
{
return most_frequent(first, last, std::less<>{});
}
#endif //AREA51_ALGORITHM_HPP
It took roughly 3 milliseconds to find the most frequent integer in sequence of 100'000 integers that varies from 0 to 100 (release build, still faster than human reaction). The benchmark was very simplistic, so in the real world scenario performance can be different. Some further twisting of input (still simple tests) showed that range of the input (e.g. how many distinct elements are in the sequence) gives algorithmic performance degradation. Usage:
#include <iostream>
#include <string>
struct integer
{
int x;
integer(int y):
x(y)
{}
integer(const integer& other) = delete; //non copyable
integer& operator=(const integer& other) = delete;
};
bool operator<(const integer& lhs, const integer& rhs)
{
return lhs.x < rhs.x;
}
std::ostream& operator<<(std::ostream& os, const integer& x)
{
return os << x.x;
}
int main()
{
int arr[] = {1, 2, 3, 4 , 5, 1};
std::string names[] = {"Olzhas", "Erasyl", "Aigerym", "Akbota", "Akbota", "Erasyl", "Olzhas", "Olzhas"};
auto answer = most_frequent(std::begin(arr), std::end(arr));
std::cout << "The most frequent integer is " <<
*answer.first << " which occured " <<
answer.second << " times\n";
auto most_frequent_name = most_frequent(std::begin(names), std::end(names));
std::cout << "The most frequent name is " <<
*most_frequent_name.first << " which occured " <<
most_frequent_name.second << " times\n";
integer weird_integers[] = {0, 1, 2, 3, 4, 5, 6, 1};
auto most_frequent_integer = most_frequent(std::begin(weird_integers), std::end(weird_integers));
std::cout << "The most frequent weird integer is " <<
*most_frequent_integer.first << " which occured " <<
most_frequent_integer.second << " times\n";
}
The code executes in time faster than human reaction, so I think for the first version this should be enough.
I'm interested in naming (I believe most_frequent
doesn't really match the algorithm), readability and conformance to specification (from the transform_iterator
, I found that it is quite hard to conform it, though it works for my needs). I also thought about std::unordered_map
, but then I would specify too many input variables and types.
By the way, the names are kazakh names :)
2 Answers 2
Does not compile unless I add the move operators to integer
.
struct integer
{
int x;
integer(int y):
x(y)
{}
integer(const integer& other) = delete; //non copyable
integer& operator=(const integer& other) = delete;
// Added these
integer(integer&&) = default;
integer& operator=(integer&&) = default;
};
You don't need that lamda in the your main function. You just did not specify your comparison operator correctly.
return most_frequent(first, last, std::less<>{});
// Should be
return most_frequent(first, last, std::less<typename std::iterator_traits<ForwardIterator>::value_type>{});
Then you can remove:
auto comp = [&comparator](const auto& lhs, const auto& rhs)
{
return comparator(lhs.get(), rhs.get());
};
and just use Comparator
where you use comp
There is no need to have two versions of the function most_frequent
just use default parameter values.
// This is not needed
// You can remove it.
template <typename ForwardIterator>
std::pair<ForwardIterator, std::size_t> most_frequent(ForwardIterator first, ForwardIterator last)
{
return most_frequent(first, last, std::less<>{});
}
Modify the declaration of the main function:
template <typename ForwardIterator, typename Comparator = std::less<typename std::iterator_traits<ForwardIterator>::value_type>>
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
std::pair<ForwardIterator, std::size_t> most_frequent(
ForwardIterator first,
ForwardIterator last,
Comparator comparator = Comparator())
// ^^^^^^^^^^^^
The code executes in time faster than human reaction, so I think for the first version this should be enough.
I would hope so.
Your tests sets are relatively small. Haw long does it take to scan the whole library of congress would be a better test.
Final Version:
template <typename ForwardIterator, typename Comparator = std::less<typename std::iterator_traits<ForwardIterator>::value_type>>
std::pair<ForwardIterator, std::size_t> most_frequent(ForwardIterator first,
ForwardIterator last, Comparator comparator = Comparator())
{
std::map<std::reference_wrapper<typename std::iterator_traits<ForwardIterator>::value_type>,
std::size_t, Comparator> counts(comparator);
std::size_t frequency = 0;
auto most_freq = first;
while (first != last)
{
std::size_t current = ++counts[*first];
if (current > frequency)
{
frequency = current;
most_freq = first;
}
++first;
}
return std::make_pair(most_freq, frequency);
}
-
\$\begingroup\$ I always forget about default typename. I believe in some cases
reference_wrapper
won't perform implicit conversion, though I don't really know in which, so I just wanted to be safe. By the way, it did compile on clang++-5.0 trunk. \$\endgroup\$Incomputable– Incomputable2017年03月10日 19:18:07 +00:00Commented Mar 10, 2017 at 19:18
You can avoid use of std::reference_wrapper
by using ForwardIterator
as the key type in the map
for keeping track of occurrences of elements in the container.
template <typename ForwardIterator,
typename Comparator = std::less<typename std::iterator_traits<ForwardIterator>::value_type>>
std::pair<ForwardIterator, std::size_t> most_frequent(ForwardIterator first,
ForwardIterator last,
Comparator comparator = Comparator())
{
auto comp = [&comparator](ForwardIterator lhs, ForwardIterator rhs)
{
return comparator(*lhs, *rhs);
};
std::map<ForwardIterator, std::size_t, decltype(comp)> counts(comp);
std::size_t frequency = 0;
ForwardIterator most_freq = first;
while (first != last)
{
std::size_t current = ++counts[first];
if (current > frequency)
{
frequency = current;
most_freq = first;
}
++first;
}
return {most_freq, frequency};
}