I'm reading CLRS, and to practice, I rolled my version of radix sort and counting sort. I was looking at some reference implementations, particularly the LSD one from Rosetta Code, and mine performs significantly better (~10 times), especially on 64 bit inputs and if the maximum input range is known.
I think one place that could be improved is the creation of res
inside counting sort. It'd be faster if I could somehow use reserve
as right now I default initialize everything and then assign over them. However, I can't grow it using push_back
since in the following loop, res
is grown randomly, rather than linearly as required by push_back
. I don't know of any data structure that can have a runtime set size with unintialized values that also gets optimized nicely in the copy
line after the loop.
I experimented with variable length arrays such as T res[n]
and it works up to the stack limit. However, limiting input size to the size of my stack is not acceptable. Anyway, this change did not have a noticeable impact on performance.
Suggestions for improvements would be appreciated.
Counting sort
// counting sort, assumes each input is integral between 0 to k
// O(n) if k = O(n)
template <typename Iter, typename Op>
void cnt_sort(Iter begin, Iter end, size_t k, Op op) {
vector<int> counts(k); // init to 0
for (auto i = begin; i != end; ++i) // count # elems == i
++counts[op(*i)];
for (size_t i = 1; i < k; ++i)
counts[i] += counts[i-1]; // turn into # elems <= i
vector<typename Iter::value_type> res(distance(begin, end)); // doing useless initialization here
for (auto j = end;;) {
--j;
res[--counts[op(*j)]] = *j;
if (j == begin) break;
}
copy(res.begin(), res.end(), begin); // compiler optimizes this out
}
Radix sort
// radix sort, more practical than counting sort
// O(d(n + k)) running time where d is # digits, k is size of digit
class Digit_cmp { // functor for comparing a "digit" (particular bits)
const long long mask; // 0..63 bitfield to test against
const size_t to_shift;
public:
Digit_cmp(long long m, size_t ts) : mask(m), to_shift(ts) {}
template <typename T>
T operator()(T n) const {
return (n & mask) >> to_shift; // mask then shift to unit digit ex. 0xfab20000 >> 16
}
};
template <typename Iter>
void rdx_sort(Iter begin, Iter end, int bits) {
// bits is # bits to consider up to if a max val is known ahead of time
// most efficent when digits are base n, having lg(n) bits
size_t r {static_cast<size_t>(log2(end - begin))}; // # bits in digit
size_t k {static_cast<size_t>(pow(2, r))}; // range of digit
size_t d {0}; // current digit num
for (long long mask = ~(~0 << r);; // ex. 0x000000ff for setting lower 8 bits on 32 bit num
mask <<= r) {
cnt_sort(begin, end, k, Digit_cmp(mask, r*d));
++d;
if (mask & (1 << (bits-1))) break; // finished masking most significant digit
}
}
template <typename Iter> // range of input not known, just use max ex. 32 bits for ints
void rdx_sort(Iter begin, Iter end) {
int bits {sizeof(typename Iter::value_type)*CHAR_BIT};
rdx_sort(begin, end, bits);
}
1 Answer 1
I found a list a few things that you could improve, but nothing that would significantly change the way your sorts work:
At the end of the counting sort, you could use
std::move
instead ofstd::copy
to move the elements to the original collection. For simple integers, it shouldn't change anything but it could make a significant difference if you try to sort big integers.There is a more idiomatic way to compute
sizeof(typename Iter::value_type)*CHAR_BIT
in the standard library thanks tostd::numeric_limits::digits
, so you can actually rewriterdx_sort
as follows:template <typename Iter> // range of input not known, just use max ex. 32 bits for ints void rdx_sort(Iter begin, Iter end) { int bits {std::numeric_limits<typename Iter::value_type>::digits}; rdx_sort(begin, end, bits); }
Any number library (like a big number library) is allowed to specialize
std::numeric_limits
and every decent numbers library does so, so I shouldn't make your code less portable. By the way, the returned value is not alwaysint
, so the best thing you could do is to use a type alias to make things clearer:template <typename Iter> // range of input not known, just use max ex. 32 bits for ints void rdx_sort(Iter begin, Iter end) { using value_type = typename Iter::value_type value_type bits {std::numeric_limits<value_type>::digits}; rdx_sort(begin, end, bits); }
Be careful of wild assumptions:
const long long mask; // 0..63 bitfield to test against
long long
is not guaranteed to be a 64-bit integer, even if it is more often the case than not. If you want to avoid problems, you should usestd::int64_t
from<cstdint>
. This type only exists if the architecture actually has a 64-bit integer, bug I guess that you will have more problems anyway if your implementation does not.const std::int64_t mask; // 0..63 bitfield to test against
It seems that you are using
using namespace std;
, which is something you should really avoid in a header-only library like yours. Using this directive will import every name fromstd::
into the global namespace for anyone using your library and you will for sure end with name clashes at some point. Considerstd::
-qualifying everything from the C and C++ standard libraries (except macros of course), evenstd::size_t
: some implementations don't import the names from the C standard library into the global namespace.Dropping random vowels from function names isn't going to help anymore. Consider using full names like
counting_sort
andradix_sort
instead.