Counting sort using STL

Question 1

I'm trying to learn to use the C++ Standard Library and some of the modern C++11 features. Can someone review my counting sort algorithm below and critique my style/algorithm/use of the STL? Thank you!

#include <algorithm>
#include <chrono>
#include <iostream>
#include <iterator>
#include <random>
#include <vector>
const int kSize = 100000000; // Size of container to sort
const int kRangeFrom = -1000000; // first of range for random number generator
const int kRangeTo = 1000000; // last of range for random number generator
// Linear time sorting algorithm for integers
template<typename InputIterator>
void counting_sort(InputIterator first, InputIterator last) {
 auto minmax_el = std::minmax_element(first, last);
 auto min = *minmax_el.first;
 auto max = *minmax_el.second;
 std::vector<std::size_t> counts(max - min + 1, 0);
 std::for_each(first, last, [&](auto x) { 
 ++counts[x - min]; // Store value counts
 }); 
 for (auto it_c = counts.begin(); it_c != counts.end(); ++it_c) { 
 auto idx = std::distance(counts.begin(), it_c);
 std::fill_n(first, *it_c, idx + min); // Store in sorted order
 std::advance(first, *it_c);
 }
}
int main() {
 std::random_device rd;
 std::mt19937 mt(rd());
 std::uniform_int_distribution<int> dist(kRangeFrom,kRangeTo);
 std::vector<int> v1(kSize);
 std::generate(v1.begin(), v1.end(), [&](){ return dist(mt); });
 std::vector<int> v2(kSize);
 std::copy(v1.begin(), v1.end(), v2.begin());
 auto first1 = std::chrono::steady_clock::now();
 counting_sort(v1.begin(), v1.end());
 auto last1 = std::chrono::steady_clock::now();
 auto first2 = std::chrono::steady_clock::now();
 std::sort(v2.begin(), v2.end());
 auto last2 = std::chrono::steady_clock::now();
 std::cout << "counting sort time: " << std::chrono::duration<double, std::milli>(last1 - first1).count() << " ms" << '\n';
 std::cout << "std::sort time: " << std::chrono::duration<double, std::milli>(last2 - first2).count() << " ms" << '\n';
 std::cout << "v1 == v2: " << std::equal(v1.begin(), v1.end(), v2.begin()) << '\n';
 return 0;
}

Question 2

Associative container

Update: Because of the O(n.log(n)) nature of std::map we have concluded this is not a good idea (But it was worth the test).

Rather than using a vector to store the count you can use an associative container.

std::vector<std::size_t> counts(max - min + 1, 0);
// replace with 
using ValueType = std::iterator_traits<InputIterator>::value_type;
std::map<ValueType, std::size_t> counts;

This will limit the amount of memory you use otherwise the amount of space you use could potentially exceed memory.
Also iterating over a sparse array would be expensive (As there will be lots of zero counts). By using an associative container you only iterate over valid values.

Range based for

Use the new range based for.

for (auto it_c = counts.begin(); it_c != counts.end(); ++it_c) {
// replace with:
for(auto const& value: counts) {

Combine range based for and associative containers

void counting_sort(InputIterator first, InputIterator last)
{
 using ValueType = std::iterator_traits<InputIterator>::value_type;
 std::map<ValueType, std::size_t> counts;
 for(auto value: boost::make_iterator_range(first, last)) { 
 ++counts[value];
 }
 for(auto count: counts) {
 ValueType& value = count.first;
 std::size_t& size = count.second; 
 std::fill_n(first, size, value);
 std::advance(first, size);
 }
}

Question 3

Thanks! The map makes it a lot cleaner but I think it makes it O(nlogn) rather than O(n). ++counts[value]; runs pretty slowly

Question 4

I would replace the order-based log2(n) insertion-time std::map with it's hash-table equivalent (and thus O(1) amortized insertion time) std::unordered_map.

Question 5

@Bizkit: Unfortunately that's not going to work. As the map iterators are ordered while the unordered_map iterators are unordered. We are relying on the property of ordered to get the sort to work.

Question 6

@ryan: You are correct. But because of the property of sparseness your code is only O(n) for large values of n. For small values of n the complexity is more like O(range(n)). I would add a check for small ranges if (std::distance(first, last) < 1000000) {std::sort(first, last);return}.

Question 7

Thanks. I could also turn it into radix sort to fix the sparseness problem.

Question 8

Counting sort is an excellent tool when you know that you will be sorting integer values in a tight range, so I would keep it as is with its « problems » instead of trying to turn it into a more generic algorithm. If you want a more flexible integer sorting algorithm, radix sort would be the way to go :)

However, while keeping the spirit and simplicity of counting sort, there are still several small things you can do to make it better:

InputIterator is never a suitable iterator category for an inplace sorting algorithm since input iterators are single-pass, which means that it's not guaranteed that you can write values back into them once you have read them. Your counting_sort implementation even reads the iterated values several times. The iterator category you want is ForwardIterator.

Note: it might be a mere hint for the reader today, but this kind of thing will be even more meaningful when concepts will be included into the language.

You can return early and save memory when the original collection is already sorted: while the standard library does not provide such a function (it's too specific), it is possible to perform std::minmax_element and std::is_sorted at once without any significant additional cost. You can find such a function in Boost.Sort spreadsort implementation:

template <class RandomAccessIter>
inline bool
is_sorted_or_find_extremes(RandomAccessIter current, RandomAccessIter last,
 RandomAccessIter & max, RandomAccessIter & min)
{
 min = max = current;
 //This assumes we have more than 1 element based on prior checks.
 while (!(*(current + 1) < *current)) {
 //If everything is in sorted order, return
 if (++current == last - 1)
 return true;
 }
 //The maximum is the last sorted element
 max = current;
 //Start from the first unsorted element
 while (++current < last) {
 if (*max < *current)
 max = current;
 else if (*current < *min)
 min = current;
 }
 return false;
}

Just make sure beforehand that fist != last if you don't want surprises due to dereferencing iterators that theoretically point to nowhere.

std::advance is \$O(n)\$ for forward iterators and bidirectional iterators, which means that if you want to sort an std::list or an std::forward_list with your counting_sort, you will end advancing twice the same iterator in \$O(n)\$ with the following lines:
```
std::fill_n(first, size, value);
std::advance(first, size);
```
This is a waste of time since std::fill_n has already computed the iterator where to advance first internally. Fortunately, since C++11, std::fill_n returns the iterator past the last element assigned, so you can simply turn the two lines above into a single more efficient one:
```
first = std::fill_n(first, size, value);
```
Now counting_sort should be a bit more efficient when given forward or bidirectional iterators.

Loki Astari Loki Astari 97.7k5 gold badges126 silver badges341 bronze badges · Accepted Answer · 2016-04-26 05:32:29Z

Associative container

Update: Because of the O(n.log(n)) nature of std::map we have concluded this is not a good idea (But it was worth the test).

Rather than using a vector to store the count you can use an associative container.

std::vector<std::size_t> counts(max - min + 1, 0);
// replace with 
using ValueType = std::iterator_traits<InputIterator>::value_type;
std::map<ValueType, std::size_t> counts;

This will limit the amount of memory you use otherwise the amount of space you use could potentially exceed memory.
Also iterating over a sparse array would be expensive (As there will be lots of zero counts). By using an associative container you only iterate over valid values.

Range based for

Use the new range based for.

for (auto it_c = counts.begin(); it_c != counts.end(); ++it_c) {
// replace with:
for(auto const& value: counts) {

Combine range based for and associative containers

void counting_sort(InputIterator first, InputIterator last)
{
 using ValueType = std::iterator_traits<InputIterator>::value_type;
 std::map<ValueType, std::size_t> counts;
 for(auto value: boost::make_iterator_range(first, last)) { 
 ++counts[value];
 }
 for(auto count: counts) {
 ValueType& value = count.first;
 std::size_t& size = count.second; 
 std::fill_n(first, size, value);
 std::advance(first, size);
 }
}

Thanks! The map makes it a lot cleaner but I think it makes it O(nlogn) rather than O(n). ++counts[value]; runs pretty slowly
I would replace the order-based log2(n) insertion-time std::map with it's hash-table equivalent (and thus O(1) amortized insertion time) std::unordered_map.
@Bizkit: Unfortunately that's not going to work. As the map iterators are ordered while the unordered_map iterators are unordered. We are relying on the property of ordered to get the sort to work.
@ryan: You are correct. But because of the property of sparseness your code is only O(n) for large values of n. For small values of n the complexity is more like O(range(n)). I would add a check for small ranges if (std::distance(first, last) < 1000000) {std::sort(first, last);return}.
Thanks. I could also turn it into radix sort to fix the sparseness problem.

Stack Exchange Network

Counting sort using STL

2 Answers 2

Associative container

Range based for

Combine range based for and associative containers

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Counting sort using STL

2 Answers 2

Associative container

Range based for

Combine range based for and associative containers

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions