I'm trying to learn to use the C++ Standard Library and some of the modern C++11 features. Can someone review my counting sort algorithm below and critique my style/algorithm/use of the STL? Thank you!
#include <algorithm>
#include <chrono>
#include <iostream>
#include <iterator>
#include <random>
#include <vector>
const int kSize = 100000000; // Size of container to sort
const int kRangeFrom = -1000000; // first of range for random number generator
const int kRangeTo = 1000000; // last of range for random number generator
// Linear time sorting algorithm for integers
template<typename InputIterator>
void counting_sort(InputIterator first, InputIterator last) {
auto minmax_el = std::minmax_element(first, last);
auto min = *minmax_el.first;
auto max = *minmax_el.second;
std::vector<std::size_t> counts(max - min + 1, 0);
std::for_each(first, last, [&](auto x) {
++counts[x - min]; // Store value counts
});
for (auto it_c = counts.begin(); it_c != counts.end(); ++it_c) {
auto idx = std::distance(counts.begin(), it_c);
std::fill_n(first, *it_c, idx + min); // Store in sorted order
std::advance(first, *it_c);
}
}
int main() {
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_int_distribution<int> dist(kRangeFrom,kRangeTo);
std::vector<int> v1(kSize);
std::generate(v1.begin(), v1.end(), [&](){ return dist(mt); });
std::vector<int> v2(kSize);
std::copy(v1.begin(), v1.end(), v2.begin());
auto first1 = std::chrono::steady_clock::now();
counting_sort(v1.begin(), v1.end());
auto last1 = std::chrono::steady_clock::now();
auto first2 = std::chrono::steady_clock::now();
std::sort(v2.begin(), v2.end());
auto last2 = std::chrono::steady_clock::now();
std::cout << "counting sort time: " << std::chrono::duration<double, std::milli>(last1 - first1).count() << " ms" << '\n';
std::cout << "std::sort time: " << std::chrono::duration<double, std::milli>(last2 - first2).count() << " ms" << '\n';
std::cout << "v1 == v2: " << std::equal(v1.begin(), v1.end(), v2.begin()) << '\n';
return 0;
}
2 Answers 2
Associative container
Update: Because of the O(n.log(n)) nature of std::map we have concluded this is not a good idea (But it was worth the test).
Rather than using a vector to store the count you can use an associative container.
std::vector<std::size_t> counts(max - min + 1, 0);
// replace with
using ValueType = std::iterator_traits<InputIterator>::value_type;
std::map<ValueType, std::size_t> counts;
- This will limit the amount of memory you use otherwise the amount of space you use could potentially exceed memory.
- Also iterating over a sparse array would be expensive (As there will be lots of zero counts). By using an associative container you only iterate over valid values.
Range based for
Use the new range based for.
for (auto it_c = counts.begin(); it_c != counts.end(); ++it_c) {
// replace with:
for(auto const& value: counts) {
Combine range based for and associative containers
void counting_sort(InputIterator first, InputIterator last)
{
using ValueType = std::iterator_traits<InputIterator>::value_type;
std::map<ValueType, std::size_t> counts;
for(auto value: boost::make_iterator_range(first, last)) {
++counts[value];
}
for(auto count: counts) {
ValueType& value = count.first;
std::size_t& size = count.second;
std::fill_n(first, size, value);
std::advance(first, size);
}
}
-
\$\begingroup\$ Thanks! The map makes it a lot cleaner but I think it makes it O(nlogn) rather than O(n).
++counts[value];
runs pretty slowly \$\endgroup\$ryan– ryan2016年04月26日 07:04:01 +00:00Commented Apr 26, 2016 at 7:04 -
\$\begingroup\$ I would replace the order-based log2(n) insertion-time
std::map
with it's hash-table equivalent (and thus O(1) amortized insertion time)std::unordered_map
. \$\endgroup\$Bizkit– Bizkit2016年04月26日 13:59:31 +00:00Commented Apr 26, 2016 at 13:59 -
2\$\begingroup\$ @Bizkit: Unfortunately that's not going to work. As the map iterators are ordered while the unordered_map iterators are unordered. We are relying on the property of ordered to get the sort to work. \$\endgroup\$Loki Astari– Loki Astari2016年04月26日 14:37:29 +00:00Commented Apr 26, 2016 at 14:37
-
\$\begingroup\$ @ryan: You are correct. But because of the property of sparseness your code is only
O(n)
for large values ofn
. For small values ofn
the complexity is more likeO(range(n))
. I would add a check for small rangesif (std::distance(first, last) < 1000000) {std::sort(first, last);return}
. \$\endgroup\$Loki Astari– Loki Astari2016年04月26日 15:05:34 +00:00Commented Apr 26, 2016 at 15:05 -
\$\begingroup\$ Thanks. I could also turn it into radix sort to fix the sparseness problem. \$\endgroup\$ryan– ryan2016年04月26日 17:36:44 +00:00Commented Apr 26, 2016 at 17:36
Counting sort is an excellent tool when you know that you will be sorting integer values in a tight range, so I would keep it as is with its « problems » instead of trying to turn it into a more generic algorithm. If you want a more flexible integer sorting algorithm, radix sort would be the way to go :)
However, while keeping the spirit and simplicity of counting sort, there are still several small things you can do to make it better:
InputIterator
is never a suitable iterator category for an inplace sorting algorithm since input iterators are single-pass, which means that it's not guaranteed that you can write values back into them once you have read them. Yourcounting_sort
implementation even reads the iterated values several times. The iterator category you want isForwardIterator
.Note: it might be a mere hint for the reader today, but this kind of thing will be even more meaningful when concepts will be included into the language.
You can return early and save memory when the original collection is already sorted: while the standard library does not provide such a function (it's too specific), it is possible to perform
std::minmax_element
andstd::is_sorted
at once without any significant additional cost. You can find such a function in Boost.Sortspreadsort
implementation:template <class RandomAccessIter> inline bool is_sorted_or_find_extremes(RandomAccessIter current, RandomAccessIter last, RandomAccessIter & max, RandomAccessIter & min) { min = max = current; //This assumes we have more than 1 element based on prior checks. while (!(*(current + 1) < *current)) { //If everything is in sorted order, return if (++current == last - 1) return true; } //The maximum is the last sorted element max = current; //Start from the first unsorted element while (++current < last) { if (*max < *current) max = current; else if (*current < *min) min = current; } return false; }
Just make sure beforehand that
fist != last
if you don't want surprises due to dereferencing iterators that theoretically point to nowhere.std::advance
is \$O(n)\$ for forward iterators and bidirectional iterators, which means that if you want to sort anstd::list
or anstd::forward_list
with yourcounting_sort
, you will end advancing twice the same iterator in \$O(n)\$ with the following lines:std::fill_n(first, size, value); std::advance(first, size);
This is a waste of time since
std::fill_n
has already computed the iterator where to advancefirst
internally. Fortunately, since C++11,std::fill_n
returns the iterator past the last element assigned, so you can simply turn the two lines above into a single more efficient one:first = std::fill_n(first, size, value);
Now
counting_sort
should be a bit more efficient when given forward or bidirectional iterators.