Sorting elements according to frequency of occurence in C++

Question 1

I am sorting the elements in an array according to their frequency of occurrence in decreasing order and if two elements have same frequency then they are sorted in increasing order. I have used std::map to count the frequency and then copy the elements from std::map to std::vector<std::pair<K, V>>.

#include <iostream>
#include <vector>
#include <map>
#include <algorithm>
struct greater
{
 template<class T>
 bool operator()(T const &a, T const &b) const { return a.second > b.second; }
};
template<class T>
void sortByFreq(std::vector<T>& v)
{
 std::map<T, T> count;
 for (int i : v){
 count[i]++;
 }
 std::vector<std::pair<T, T>> mapcopy;
 for(auto itr = count.begin(); itr != count.end(); itr++)
 {
 mapcopy.push_back(*itr);
 }
 std::sort(mapcopy.begin(), mapcopy.end(), greater());
 std::cout << "Elements sorted according to Frequency \n";
 auto itr = mapcopy.begin();
 while(itr != mapcopy.end())
 {
 for(int i = 0; i < itr->second; i++)
 {
 std::cout << itr->first <<" ";
 }
 itr++;
 }
}
int main()
{
 std::vector<int> vec = {2, 3, 2, 4, 5, 12, 12, 3, 4, 3};
 sortByFreq(vec);
 std::vector<char> c ={ 'e', 't', 'r', 'e', 't', 't', 'f'};
 sortByFreq(c);
return 0;
}

The Output is

3 3 3 2 2 4 4 12 12 5
t t t e e f r

Review this code and help me to improve it.

Question 2

Advice 1

Do not print in the algorithms. If your friend wants to reuse your implementation on large vectors, he will be overwhelmed with the plethora of output.

Advice 2

You can simplify your sorting algorithm via an appropriate comparator:

template<class T>
void sortByFreq2(std::vector<T>& v)
{
 std::unordered_map<T, size_t> count;
 for (T i : v) {
 count[i]++;
 }
 std::sort(
 v.begin(), 
 v.end(),
 [&count](T const& a, T const& b) {
 if (a == b) {
 return false;
 }
 if (count[a] > count[b]) {
 return true;
 }
 else if (count[a] < count[b]) {
 return false;
 }
 return a < b;
 });
}

Note unordered_map<T, size_t> instead of map<T, T> since you don't need to keep the keys T in order (more efficient).

Hope that helps.

Question 3

Interesting, you take it in a different direction.

Question 4

@Deduplicator Well, it's just a matter of sorting after all.

Question 5

I took the approach that the function did what it should, though badly and with a bad name. You took the approach that the name was right, and all else lacking. That's what I meant with different direction.

Question 6

sortByFrequency is mis-named. What it actually does is print a histogram, and to make it pretty it orders the output canonically.
Call it printHistogram instead.
You should allow printing to any user-supplied ostream, though std::cout is a good default.
Above function has an argument of type std::vector<T>&. Aside from unnecessarily restricting your input to full std::vector's, you are also taking it by mutable reference, signalling intent to modify. You should instead accept an iterator-range (with the range-TS that should change to iterator+sentinel, or any range).
Your secondary sort-criterium is not only found in your description. And the computer is unlikely to honour that.
Anyway, you don't need your own custom comparator if you define your elements properly.
You don't seem too clear on where you have to use the type-parameter, and where you should use a type for the count of occurrences, which by the way should be an unsigned type as it cannot be negative.
Consider whether a std::unordered_map or even an array might be more appropriate / efficient than a std::map.
Avoid copying whenever you can. auto&& is your friend there.
Avoid postfix-increment / -decrement in favour of prefix operations. While it doesn't matter much for builtin types (the compiler should optimize it the same), sometimes the copy mandated by the former cannot be optimized away by the compiler, and some types are expensive to copy.
i per convention denotes an index, not an element. Better use x.
You seem to feel under an obligation to only use each loop-type once. Consider always using the most appropriate loop, in your case always for-range.
Inserting a single character instead of a length-1-string into a stream is potentially slightly more efficient.
Sometimes you don't indent the last line of a block. And your indentation isn't uniform. Fix that.
As an aside, return 0; is implicit in main(), though only there.

Modified code (online on coliru):

#include <iostream>
#include <vector>
#include <unordered_map>
#include <algorithm>
#include <iterator>
#include <utility>
template <class It>
void printHistogram(It first, It last, std::ostream& os = std::cout)
{
 using val_t = typename std::iterator_traits<It>::value_type;
 using diff_t = typename std::iterator_traits<It>::difference_type;
 std::unordered_map<val_t, diff_t> counts;
 for (; first != last; ++first)
 ++counts[*first];
 std::vector<std::pair<diff_t, val_t>> histo;
 histo.reserve(counts.size());
 for (auto&& x : counts)
 histo.emplace_back(-x.second, std::move(x.first));
 std::sort(begin(histo), end(histo));
 os << "Elements sorted according to Frequency\n";
 for (auto&& x : histo)
 for (auto i = x.first; i++;)
 os << x.second << ' ';
}
int main()
{
 std::vector<int> vec = {2, 3, 2, 4, 5, 12, 12, 3, 4, 3};
 printHistogram(begin(vec), end(vec));
 std::cout << '\n';
 std::vector<char> c ={ 'e', 't', 'r', 'e', 't', 't', 'f'};
 printHistogram(begin(c), end(c));
}

Question 7

printHistogram looks nice.

Question 8

In this line

for (int i : v)

is assumed T can be forcibly casted to int. The code would not work with other template types (like std::string).

You might want to distinguish more clearly the template type T and the internal counter type (currently mixed int and T, I would recommend unsigned, by the way). This includes changing the type of count to std::map<T, unsigned> count;, and the type of mapcopy accordingly.
Same line: after changing the type to T, you might even want to use a reference: for (T &i: v).

This won't do anything useful when dealing with (say) int types. But when T is a more complex type (like a map, or a rather largeish gigabyte-sized image - it's a template type after all: we don't know what a future developer will use it for), it's better not to create a temporary copy when there is no need to do so.
Depending on the size/internal structure of T, it might save a lot of time to do a mapcopy.reserve(count.size()) - as you perfectly know the final vector size in advance. This
- will eliminate some complete unnecessary data copying and vector resizing, and
- might even save memory (up to 50%), depending on the resizing strategy of std::vector (as it will often allocate more than needed to be prepared for future push_back opererations).
Remember, T might be a very complex type with an expensive copy operator... it would be nice not to relocate instances of T without any need when the vector has to grow.

coderodde coderodde 31.7k15 gold badges77 silver badges202 bronze badges · Accepted Answer · 2017-08-19 11:21:05Z

Advice 1

Do not print in the algorithms. If your friend wants to reuse your implementation on large vectors, he will be overwhelmed with the plethora of output.

Advice 2

You can simplify your sorting algorithm via an appropriate comparator:

template<class T>
void sortByFreq2(std::vector<T>& v)
{
 std::unordered_map<T, size_t> count;
 for (T i : v) {
 count[i]++;
 }
 std::sort(
 v.begin(), 
 v.end(),
 [&count](T const& a, T const& b) {
 if (a == b) {
 return false;
 }
 if (count[a] > count[b]) {
 return true;
 }
 else if (count[a] < count[b]) {
 return false;
 }
 return a < b;
 });
}

Note unordered_map<T, size_t> instead of map<T, T> since you don't need to keep the keys T in order (more efficient).

Hope that helps.

@Deduplicator Well, it's just a matter of sorting after all.
I took the approach that the function did what it should, though badly and with a bad name. You took the approach that the name was right, and all else lacking. That's what I meant with different direction.

Stack Exchange Network

Sorting elements according to frequency of occurence in C++

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Sorting elements according to frequency of occurence in C++

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions