9
\$\begingroup\$

I'm trying to learn to use the C++ Standard Library and some of the modern C++11 features. Can someone review my counting sort algorithm below and critique my style/algorithm/use of the STL? Thank you!

#include <algorithm>
#include <chrono>
#include <iostream>
#include <iterator>
#include <random>
#include <vector>
const int kSize = 100000000; // Size of container to sort
const int kRangeFrom = -1000000; // first of range for random number generator
const int kRangeTo = 1000000; // last of range for random number generator
// Linear time sorting algorithm for integers
template<typename InputIterator>
void counting_sort(InputIterator first, InputIterator last) {
 auto minmax_el = std::minmax_element(first, last);
 auto min = *minmax_el.first;
 auto max = *minmax_el.second;
 std::vector<std::size_t> counts(max - min + 1, 0);
 std::for_each(first, last, [&](auto x) { 
 ++counts[x - min]; // Store value counts
 }); 
 for (auto it_c = counts.begin(); it_c != counts.end(); ++it_c) { 
 auto idx = std::distance(counts.begin(), it_c);
 std::fill_n(first, *it_c, idx + min); // Store in sorted order
 std::advance(first, *it_c);
 }
}
int main() {
 std::random_device rd;
 std::mt19937 mt(rd());
 std::uniform_int_distribution<int> dist(kRangeFrom,kRangeTo);
 std::vector<int> v1(kSize);
 std::generate(v1.begin(), v1.end(), [&](){ return dist(mt); });
 std::vector<int> v2(kSize);
 std::copy(v1.begin(), v1.end(), v2.begin());
 auto first1 = std::chrono::steady_clock::now();
 counting_sort(v1.begin(), v1.end());
 auto last1 = std::chrono::steady_clock::now();
 auto first2 = std::chrono::steady_clock::now();
 std::sort(v2.begin(), v2.end());
 auto last2 = std::chrono::steady_clock::now();
 std::cout << "counting sort time: " << std::chrono::duration<double, std::milli>(last1 - first1).count() << " ms" << '\n';
 std::cout << "std::sort time: " << std::chrono::duration<double, std::milli>(last2 - first2).count() << " ms" << '\n';
 std::cout << "v1 == v2: " << std::equal(v1.begin(), v1.end(), v2.begin()) << '\n';
 return 0;
}
200_success
145k22 gold badges190 silver badges478 bronze badges
asked Apr 26, 2016 at 3:21
\$\endgroup\$

2 Answers 2

7
\$\begingroup\$

Associative container

Update: Because of the O(n.log(n)) nature of std::map we have concluded this is not a good idea (But it was worth the test).

Rather than using a vector to store the count you can use an associative container.

std::vector<std::size_t> counts(max - min + 1, 0);
// replace with 
using ValueType = std::iterator_traits<InputIterator>::value_type;
std::map<ValueType, std::size_t> counts;
  1. This will limit the amount of memory you use otherwise the amount of space you use could potentially exceed memory.
  2. Also iterating over a sparse array would be expensive (As there will be lots of zero counts). By using an associative container you only iterate over valid values.

Range based for

Use the new range based for.

for (auto it_c = counts.begin(); it_c != counts.end(); ++it_c) {
// replace with:
for(auto const& value: counts) {

Combine range based for and associative containers

void counting_sort(InputIterator first, InputIterator last)
{
 using ValueType = std::iterator_traits<InputIterator>::value_type;
 std::map<ValueType, std::size_t> counts;
 for(auto value: boost::make_iterator_range(first, last)) { 
 ++counts[value];
 }
 for(auto count: counts) {
 ValueType& value = count.first;
 std::size_t& size = count.second; 
 std::fill_n(first, size, value);
 std::advance(first, size);
 }
}
answered Apr 26, 2016 at 5:32
\$\endgroup\$
5
  • \$\begingroup\$ Thanks! The map makes it a lot cleaner but I think it makes it O(nlogn) rather than O(n). ++counts[value]; runs pretty slowly \$\endgroup\$ Commented Apr 26, 2016 at 7:04
  • \$\begingroup\$ I would replace the order-based log2(n) insertion-time std::map with it's hash-table equivalent (and thus O(1) amortized insertion time) std::unordered_map. \$\endgroup\$ Commented Apr 26, 2016 at 13:59
  • 2
    \$\begingroup\$ @Bizkit: Unfortunately that's not going to work. As the map iterators are ordered while the unordered_map iterators are unordered. We are relying on the property of ordered to get the sort to work. \$\endgroup\$ Commented Apr 26, 2016 at 14:37
  • \$\begingroup\$ @ryan: You are correct. But because of the property of sparseness your code is only O(n) for large values of n. For small values of n the complexity is more like O(range(n)). I would add a check for small ranges if (std::distance(first, last) < 1000000) {std::sort(first, last);return}. \$\endgroup\$ Commented Apr 26, 2016 at 15:05
  • \$\begingroup\$ Thanks. I could also turn it into radix sort to fix the sparseness problem. \$\endgroup\$ Commented Apr 26, 2016 at 17:36
3
\$\begingroup\$

Counting sort is an excellent tool when you know that you will be sorting integer values in a tight range, so I would keep it as is with its « problems » instead of trying to turn it into a more generic algorithm. If you want a more flexible integer sorting algorithm, radix sort would be the way to go :)

However, while keeping the spirit and simplicity of counting sort, there are still several small things you can do to make it better:

  • InputIterator is never a suitable iterator category for an inplace sorting algorithm since input iterators are single-pass, which means that it's not guaranteed that you can write values back into them once you have read them. Your counting_sort implementation even reads the iterated values several times. The iterator category you want is ForwardIterator.

    Note: it might be a mere hint for the reader today, but this kind of thing will be even more meaningful when concepts will be included into the language.

  • You can return early and save memory when the original collection is already sorted: while the standard library does not provide such a function (it's too specific), it is possible to perform std::minmax_element and std::is_sorted at once without any significant additional cost. You can find such a function in Boost.Sort spreadsort implementation:

    template <class RandomAccessIter>
    inline bool
    is_sorted_or_find_extremes(RandomAccessIter current, RandomAccessIter last,
     RandomAccessIter & max, RandomAccessIter & min)
    {
     min = max = current;
     //This assumes we have more than 1 element based on prior checks.
     while (!(*(current + 1) < *current)) {
     //If everything is in sorted order, return
     if (++current == last - 1)
     return true;
     }
     //The maximum is the last sorted element
     max = current;
     //Start from the first unsorted element
     while (++current < last) {
     if (*max < *current)
     max = current;
     else if (*current < *min)
     min = current;
     }
     return false;
    }
    

    Just make sure beforehand that fist != last if you don't want surprises due to dereferencing iterators that theoretically point to nowhere.

  • std::advance is \$O(n)\$ for forward iterators and bidirectional iterators, which means that if you want to sort an std::list or an std::forward_list with your counting_sort, you will end advancing twice the same iterator in \$O(n)\$ with the following lines:

    std::fill_n(first, size, value);
    std::advance(first, size);
    

    This is a waste of time since std::fill_n has already computed the iterator where to advance first internally. Fortunately, since C++11, std::fill_n returns the iterator past the last element assigned, so you can simply turn the two lines above into a single more efficient one:

    first = std::fill_n(first, size, value);
    

    Now counting_sort should be a bit more efficient when given forward or bidirectional iterators.

answered May 6, 2016 at 18:39
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.