LSD radix sort (force avoid div or idiv compiler instruction) in C++

Question 1

I was trying to optimize the Radix Sort code, just because. Nothing more, nothing less. Here it is!!! So tell me, how does it look?

Summary:

instead of a div or mod ,i use left shift and right shift to avoid div or idiv compiler instruction.
the dual instruction is for avoiding jumps and the usual stuff.

btw you can do arr pointer with same code just change obj_arr*&arr with obj_arr**&arr and add some function to the class obj_arr to get unsigned int
example: arr[i]->get_unsigned_int().
or operator if that you like.

postdata: take in mind you are using arr of obj class.

#pragma once
#define modpow2(num,div_pow2,mod_pow2) (num>>div_pow2) - (num >> mod_pow2+div_pow2 << mod_pow2)
template<class obj_arr>
unsigned int getmax(obj_arr*& arr, size_t& size);
template<class obj_arr>
void countrad(obj_arr*& arr, obj_arr*& aux_arr, size_t& size, unsigned int*& count, const unsigned int& mod_pow2, const unsigned int& div_pow2);
template<class obj_arr>
void radix_sort_ftl(obj_arr* arr, size_t size, unsigned int mod_pow2 = 4);
template<class obj_arr>
unsigned int getmax(obj_arr*& arr, size_t& size) {
 
 size_t i = modpow2(size,0,1);
 unsigned int aux=arr[0];
 for (; i < size; i += 2) {
 aux = arr[i] * (aux < arr[i]) + aux * (aux >= arr[i]);
 aux = arr[i + 1] * (aux < arr[i + 1]) + aux * (aux >= arr[i + 1]);
 }
 return aux;
}
template<class obj_arr>
void radix_sort_ftl(obj_arr* arr, size_t size, unsigned int mod_pow2) {
 unsigned int max = getmax(arr, size) + 1;
 obj_arr* aux_arr = new obj_arr[size];
 unsigned int* count = new unsigned int[1<< mod_pow2 + 1];
 for (unsigned int divpow2 = 0; max > (1<< divpow2); divpow2 += mod_pow2){
 countrad(arr,aux_arr,size,count,mod_pow2, divpow2);
 }
 delete[] aux_arr;
 delete[] count;
}
template<class obj_arr>
void countrad(obj_arr*& arr, obj_arr*& aux_arr, size_t& size, unsigned int*& count, const unsigned int& mod_pow2, const unsigned int& div_pow2) {
 unsigned int i, move,max_mod=1<<mod_pow2;
 i = modpow2(max_mod + 1,0,1);
 count[0] = 0;
 for (; i < max_mod + 1; i += 2) {
 count[i] = 0;
 count[i + 1] = 0;
 }
 i = modpow2(size, 0, 1);
 count[modpow2(arr[0],div_pow2,mod_pow2)+ 1] += i;
 for (; i < size; i += 2) {
 count[modpow2(arr[i], div_pow2, mod_pow2) + 1]++;
 count[modpow2(arr[i+1], div_pow2, mod_pow2) + 1]++;
 }
 
 i = modpow2(max_mod + 2, 0, 1) + 1;
 for (; i < max_mod + 1; i += 2) {
 count[i] += count[i - 1];
 count[i + 1] += count[i];
 }
 i = modpow2(size, 0, 1);
 move = modpow2(arr[0], div_pow2, mod_pow2);
 aux_arr[count[move]] = arr[0];
 count[move] += i;
 for (; i < size; i += 2) {
 move = modpow2(arr[i], div_pow2, mod_pow2);
 aux_arr[count[move]] = arr[i];
 count[move]++;
 move = modpow2(arr[i+1], div_pow2, mod_pow2);
 aux_arr[count[move]] = arr[i + 1];
 count[move]++;
 }
 i = modpow2(size, 0, 1);
 arr[0] = aux_arr[0];
 for (; i < size; i += 2) {
 arr[i] = aux_arr[i];
 arr[i + 1] = aux_arr[i + 1];
 }
}
```

Question 2

I got confuse here and i put MSD insted of LSD

Question 3

Avoid macros

There is no need to make modpow2 a macro. Make it a function instead:

static constexpr size_t modpow2(size_t num, size_t div_pow2, size_t mod_pow2) {
 return (num >> div_pow2) - (num >> (mod_pow2 + div_pow2) << mod_pow2);
}

Avoid forward declarations

You should not need to write forward declarations, unless you have circular dependencies between functions. Otherwise, they are unnecessary code duplication, and there is a potential for mistakes that is better avoided.

Be consistent with the template type

You made your functions templates, so I assume you want to be able to run radix sort on arrays of various types. However, I see code like this:

unsigned int aux = arr[0];

What if I have an array of unsigned long instead? Or float? When you make a template, you want to avoid making assumptions about the type of the variables. So either use the template type name:

obj_arr aux = arr[0];

Or better yet, just use auto when possible:

auto aux = arr[0];

Also use this for the return type. So:

template<class obj_arr>
auto getmax(obj_arr*& arr, size_t& size) {
 auto aux = arr[0];
 ...
 return aux;
}

When to pass by reference

You should only pass large objects by reference. Small objects, like pointers and primitive types like int and size_t should be passed by value if possible. So:

template<class obj_arr>
auto getmax(obj_arr* arr, size_t size) {
 ...
}

If you always use references, you risk adding unnecessary pointer indirection to your code.

Use `const` where appropriate

You should mark pointers and references as const when possible, since that might allow the compiler to generate better code, and it will also give a compiler error if you accidentily do modify a value that should be constant. So:

template<class obj_arr>
auto getmax(const obj_arr* arr, size_t size) {
 ...
}

You can even be more strict and also make the parameter size_t size const, and even local variables like max_mod inside countrad().

Use `size_t` consistenly for counts and indices

Don't use unsigned int for counts, unless you are absolutely sure you will never have to deal with arrays with more elements than can be counted in whatever an unsigned int might be on all the platforms you want to support.

Use idiomatic `for`-loops

Why have you moved the initializer statement out of all your for-loops? In most cases this does not seem necessary at all. Just write for-loops like you normally do:

auto aux = arr[0];
for (i = modpow2(size, 0, 1); i < size; i += 2) {
 aux = arr[i] * (aux < arr[i]) + aux * (aux >= arr[i]);
 aux = arr[i + 1] * (aux < arr[i + 1]) + aux * (aux >= arr[i + 1]);
}

Avoid premature optimization

Your implementation of getmax() seems to be written to avoid conditional jumps. However, many architectures have conditional move instructions that will be able to perform a maximum operation without needing conditional jumps. And this is much cheaper than two multiplication operations. Instead of calling getmax(), I would just use std::max_element() like so:

auto max = *std::max_element(arr, arr + size) + 1;

Let the compiler worry about optimizing this. If you really want to be sure, measure the speed of your radix sorting function when using your getmax() vs. when using std::max_element().

Also, in general the compiler will be able to optimize multiplication and division by constants for you, and will convert them to shifts if it sees they are powers of two. However, there are a few places where you call modpow2() with variable divisors, so it does make sense to explicitly use shifts here.

Use `std::unique_ptr` to manage allocated memory

It's best to avoid calling new and delete manually, and use smart pointers that do this for you. This avoids potential mistakes such as leaking memory or deleting something twice. If you can use C++14, then use std::make_unique() to create an array for you, like so:

auto aux_arr = std::make_unique<obj_arr[]>(size);
auto count = std::make_unique<size_t[])(1 << mod_pow2 + 1);

If you are stuck with an older version, you still need to use new:

std::unqiue_ptr<obj_arr> aux_arr = new obj_arr[size];
std::unique_ptr<size_t> count = new size_t[1 << mod_pow2 + 1];

But at least you don't have to call delete anymore. You could also use std::vector for even cleaner looking code, but then you lose a bit of performance due to the initialization of the elements.

Naming things

The names of your variables and functions are inconsistent and sometimes misleading. For example, choose whether you want to use underscores to separate words (which I recommend) or not, and stick with that choice.

The name obj_arr suggests that the type is an array, but actually it's the type of the array elements. In any case, I would use the more idiomatic T for this:

template<typename T>
auto getmax(const T* arr, size_t size) {
 ...
}

The word count suggests it is a single value, but actually it is an array. I recommend you use the plural form of a noun for arrays, or append _arr like you do with aux_arr for example.

Question 4

thx very much, i learn a lot.

G. Sliepen G. Sliepen 68.9k3 gold badges74 silver badges180 bronze badges · Accepted Answer · 2021-01-28 22:34:49Z

Avoid macros

There is no need to make modpow2 a macro. Make it a function instead:

static constexpr size_t modpow2(size_t num, size_t div_pow2, size_t mod_pow2) {
 return (num >> div_pow2) - (num >> (mod_pow2 + div_pow2) << mod_pow2);
}

Avoid forward declarations

You should not need to write forward declarations, unless you have circular dependencies between functions. Otherwise, they are unnecessary code duplication, and there is a potential for mistakes that is better avoided.

Be consistent with the template type

You made your functions templates, so I assume you want to be able to run radix sort on arrays of various types. However, I see code like this:

unsigned int aux = arr[0];

What if I have an array of unsigned long instead? Or float? When you make a template, you want to avoid making assumptions about the type of the variables. So either use the template type name:

obj_arr aux = arr[0];

Or better yet, just use auto when possible:

auto aux = arr[0];

Also use this for the return type. So:

template<class obj_arr>
auto getmax(obj_arr*& arr, size_t& size) {
 auto aux = arr[0];
 ...
 return aux;
}

When to pass by reference

You should only pass large objects by reference. Small objects, like pointers and primitive types like int and size_t should be passed by value if possible. So:

template<class obj_arr>
auto getmax(obj_arr* arr, size_t size) {
 ...
}

If you always use references, you risk adding unnecessary pointer indirection to your code.

Use `const` where appropriate

You should mark pointers and references as const when possible, since that might allow the compiler to generate better code, and it will also give a compiler error if you accidentily do modify a value that should be constant. So:

template<class obj_arr>
auto getmax(const obj_arr* arr, size_t size) {
 ...
}

You can even be more strict and also make the parameter size_t size const, and even local variables like max_mod inside countrad().

Use `size_t` consistenly for counts and indices

Don't use unsigned int for counts, unless you are absolutely sure you will never have to deal with arrays with more elements than can be counted in whatever an unsigned int might be on all the platforms you want to support.

Use idiomatic `for`-loops

Why have you moved the initializer statement out of all your for-loops? In most cases this does not seem necessary at all. Just write for-loops like you normally do:

auto aux = arr[0];
for (i = modpow2(size, 0, 1); i < size; i += 2) {
 aux = arr[i] * (aux < arr[i]) + aux * (aux >= arr[i]);
 aux = arr[i + 1] * (aux < arr[i + 1]) + aux * (aux >= arr[i + 1]);
}

Avoid premature optimization

Your implementation of getmax() seems to be written to avoid conditional jumps. However, many architectures have conditional move instructions that will be able to perform a maximum operation without needing conditional jumps. And this is much cheaper than two multiplication operations. Instead of calling getmax(), I would just use std::max_element() like so:

auto max = *std::max_element(arr, arr + size) + 1;

Let the compiler worry about optimizing this. If you really want to be sure, measure the speed of your radix sorting function when using your getmax() vs. when using std::max_element().

Also, in general the compiler will be able to optimize multiplication and division by constants for you, and will convert them to shifts if it sees they are powers of two. However, there are a few places where you call modpow2() with variable divisors, so it does make sense to explicitly use shifts here.

Use `std::unique_ptr` to manage allocated memory

It's best to avoid calling new and delete manually, and use smart pointers that do this for you. This avoids potential mistakes such as leaking memory or deleting something twice. If you can use C++14, then use std::make_unique() to create an array for you, like so:

auto aux_arr = std::make_unique<obj_arr[]>(size);
auto count = std::make_unique<size_t[])(1 << mod_pow2 + 1);

If you are stuck with an older version, you still need to use new:

std::unqiue_ptr<obj_arr> aux_arr = new obj_arr[size];
std::unique_ptr<size_t> count = new size_t[1 << mod_pow2 + 1];

But at least you don't have to call delete anymore. You could also use std::vector for even cleaner looking code, but then you lose a bit of performance due to the initialization of the elements.

Naming things

The names of your variables and functions are inconsistent and sometimes misleading. For example, choose whether you want to use underscores to separate words (which I recommend) or not, and stick with that choice.

The name obj_arr suggests that the type is an array, but actually it's the type of the array elements. In any case, I would use the more idiomatic T for this:

template<typename T>
auto getmax(const T* arr, size_t size) {
 ...
}

The word count suggests it is a single value, but actually it is an array. I recommend you use the plural form of a noun for arrays, or append _arr like you do with aux_arr for example.

\$\begingroup\$ thx very much, i learn a lot. \$\endgroup\$

bigotes0invisibles
– bigotes0invisibles

2021年01月29日 02:52:16 +00:00
Commented Jan 29, 2021 at 2:52

Stack Exchange Network

LSD radix sort (force avoid div or idiv compiler instruction) in C++

1 Answer 1

Avoid macros

Avoid forward declarations

Be consistent with the template type

When to pass by reference

Use `const` where appropriate

Use `size_t` consistenly for counts and indices

Use idiomatic `for`-loops

Avoid premature optimization

Use `std::unique_ptr` to manage allocated memory

Naming things

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

LSD radix sort (force avoid div or idiv compiler instruction) in C++

1 Answer 1

Avoid macros

Avoid forward declarations

Be consistent with the template type

When to pass by reference

Use const where appropriate

Use size_t consistenly for counts and indices

Use idiomatic for-loops

Avoid premature optimization

Use std::unique_ptr to manage allocated memory

Naming things

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Use `const` where appropriate

Use `size_t` consistenly for counts and indices

Use idiomatic `for`-loops

Use `std::unique_ptr` to manage allocated memory