I am sorting the elements in an array according to their frequency of occurrence in decreasing order and if two elements have same frequency then they are sorted in increasing order. I have used std::map
to count the frequency and then copy the elements from std::map
to std::vector<std::pair<K, V>>
.
#include <iostream>
#include <vector>
#include <map>
#include <algorithm>
struct greater
{
template<class T>
bool operator()(T const &a, T const &b) const { return a.second > b.second; }
};
template<class T>
void sortByFreq(std::vector<T>& v)
{
std::map<T, T> count;
for (int i : v){
count[i]++;
}
std::vector<std::pair<T, T>> mapcopy;
for(auto itr = count.begin(); itr != count.end(); itr++)
{
mapcopy.push_back(*itr);
}
std::sort(mapcopy.begin(), mapcopy.end(), greater());
std::cout << "Elements sorted according to Frequency \n";
auto itr = mapcopy.begin();
while(itr != mapcopy.end())
{
for(int i = 0; i < itr->second; i++)
{
std::cout << itr->first <<" ";
}
itr++;
}
}
int main()
{
std::vector<int> vec = {2, 3, 2, 4, 5, 12, 12, 3, 4, 3};
sortByFreq(vec);
std::vector<char> c ={ 'e', 't', 'r', 'e', 't', 't', 'f'};
sortByFreq(c);
return 0;
}
The Output is
3 3 3 2 2 4 4 12 12 5
t t t e e f r
Review this code and help me to improve it.
3 Answers 3
Advice 1
Do not print in the algorithms. If your friend wants to reuse your implementation on large vectors, he will be overwhelmed with the plethora of output.
Advice 2
You can simplify your sorting algorithm via an appropriate comparator:
template<class T>
void sortByFreq2(std::vector<T>& v)
{
std::unordered_map<T, size_t> count;
for (T i : v) {
count[i]++;
}
std::sort(
v.begin(),
v.end(),
[&count](T const& a, T const& b) {
if (a == b) {
return false;
}
if (count[a] > count[b]) {
return true;
}
else if (count[a] < count[b]) {
return false;
}
return a < b;
});
}
Note unordered_map<T, size_t>
instead of map<T, T>
since you don't need to keep the keys T
in order (more efficient).
Hope that helps.
-
\$\begingroup\$ Interesting, you take it in a different direction. \$\endgroup\$Deduplicator– Deduplicator2017年08月19日 11:23:25 +00:00Commented Aug 19, 2017 at 11:23
-
\$\begingroup\$ @Deduplicator Well, it's just a matter of sorting after all. \$\endgroup\$coderodde– coderodde2017年08月19日 11:25:26 +00:00Commented Aug 19, 2017 at 11:25
-
1\$\begingroup\$ I took the approach that the function did what it should, though badly and with a bad name. You took the approach that the name was right, and all else lacking. That's what I meant with different direction. \$\endgroup\$Deduplicator– Deduplicator2017年08月19日 11:26:38 +00:00Commented Aug 19, 2017 at 11:26
sortByFrequency
is mis-named. What it actually does is print a histogram, and to make it pretty it orders the output canonically.
Call itprintHistogram
instead.You should allow printing to any user-supplied
ostream
, thoughstd::cout
is a good default.Above function has an argument of type
std::vector<T>&
. Aside from unnecessarily restricting your input to fullstd::vector
's, you are also taking it by mutable reference, signalling intent to modify. You should instead accept an iterator-range (with the range-TS that should change to iterator+sentinel, or any range).Your secondary sort-criterium is not only found in your description. And the computer is unlikely to honour that.
Anyway, you don't need your own custom comparator if you define your elements properly.
You don't seem too clear on where you have to use the type-parameter, and where you should use a type for the count of occurrences, which by the way should be an unsigned type as it cannot be negative.
Consider whether a
std::unordered_map
or even an array might be more appropriate / efficient than astd::map
.Avoid copying whenever you can.
auto&&
is your friend there.Avoid postfix-increment / -decrement in favour of prefix operations. While it doesn't matter much for builtin types (the compiler should optimize it the same), sometimes the copy mandated by the former cannot be optimized away by the compiler, and some types are expensive to copy.
i
per convention denotes an index, not an element. Better usex
.You seem to feel under an obligation to only use each loop-type once. Consider always using the most appropriate loop, in your case always for-range.
Inserting a single character instead of a length-1-string into a stream is potentially slightly more efficient.
Sometimes you don't indent the last line of a block. And your indentation isn't uniform. Fix that.
As an aside,
return 0;
is implicit inmain()
, though only there.
Modified code (online on coliru):
#include <iostream>
#include <vector>
#include <unordered_map>
#include <algorithm>
#include <iterator>
#include <utility>
template <class It>
void printHistogram(It first, It last, std::ostream& os = std::cout)
{
using val_t = typename std::iterator_traits<It>::value_type;
using diff_t = typename std::iterator_traits<It>::difference_type;
std::unordered_map<val_t, diff_t> counts;
for (; first != last; ++first)
++counts[*first];
std::vector<std::pair<diff_t, val_t>> histo;
histo.reserve(counts.size());
for (auto&& x : counts)
histo.emplace_back(-x.second, std::move(x.first));
std::sort(begin(histo), end(histo));
os << "Elements sorted according to Frequency\n";
for (auto&& x : histo)
for (auto i = x.first; i++;)
os << x.second << ' ';
}
int main()
{
std::vector<int> vec = {2, 3, 2, 4, 5, 12, 12, 3, 4, 3};
printHistogram(begin(vec), end(vec));
std::cout << '\n';
std::vector<char> c ={ 'e', 't', 'r', 'e', 't', 't', 'f'};
printHistogram(begin(c), end(c));
}
-
\$\begingroup\$
printHistogram
looks nice. \$\endgroup\$coderodde– coderodde2017年08月19日 11:26:57 +00:00Commented Aug 19, 2017 at 11:26
In this line
for (int i : v)
is assumed
T
can be forcibly casted toint
. The code would not work with other template types (likestd::string
).You might want to distinguish more clearly the template type
T
and the internal counter type (currently mixedint
andT
, I would recommendunsigned
, by the way). This includes changing the type ofcount
tostd::map<T, unsigned> count;
, and the type ofmapcopy
accordingly.Same line: after changing the type to
T
, you might even want to use a reference:for (T &i: v)
.This won't do anything useful when dealing with (say)
int
types. But whenT
is a more complex type (like amap
, or a rather largeish gigabyte-sized image - it's a template type after all: we don't know what a future developer will use it for), it's better not to create a temporary copy when there is no need to do so.Depending on the size/internal structure of
T
, it might save a lot of time to do amapcopy.reserve(count.size())
- as you perfectly know the final vector size in advance. This- will eliminate some complete unnecessary data copying and vector resizing, and
- might even save memory (up to 50%), depending on the resizing strategy of
std::vector
(as it will often allocate more than needed to be prepared for futurepush_back
opererations).
Remember,
T
might be a very complex type with an expensive copy operator... it would be nice not to relocate instances ofT
without any need when the vector has to grow.