I was trying to optimize the Radix Sort code, just because. Nothing more, nothing less. Here it is!!! So tell me, how does it look?
Summary:
instead of a div or mod ,i use left shift and right shift to avoid div or idiv compiler instruction.
the dual instruction is for avoiding jumps and the usual stuff.
btw you can do arr pointer with same code just change obj_arr*&arr
with obj_arr**&arr
and add some function to the class obj_arr
to get unsigned int
example: arr[i]->get_unsigned_int()
.
or operator if that you like.
postdata: take in mind you are using arr of obj class.
#pragma once
#define modpow2(num,div_pow2,mod_pow2) (num>>div_pow2) - (num >> mod_pow2+div_pow2 << mod_pow2)
template<class obj_arr>
unsigned int getmax(obj_arr*& arr, size_t& size);
template<class obj_arr>
void countrad(obj_arr*& arr, obj_arr*& aux_arr, size_t& size, unsigned int*& count, const unsigned int& mod_pow2, const unsigned int& div_pow2);
template<class obj_arr>
void radix_sort_ftl(obj_arr* arr, size_t size, unsigned int mod_pow2 = 4);
template<class obj_arr>
unsigned int getmax(obj_arr*& arr, size_t& size) {
size_t i = modpow2(size,0,1);
unsigned int aux=arr[0];
for (; i < size; i += 2) {
aux = arr[i] * (aux < arr[i]) + aux * (aux >= arr[i]);
aux = arr[i + 1] * (aux < arr[i + 1]) + aux * (aux >= arr[i + 1]);
}
return aux;
}
template<class obj_arr>
void radix_sort_ftl(obj_arr* arr, size_t size, unsigned int mod_pow2) {
unsigned int max = getmax(arr, size) + 1;
obj_arr* aux_arr = new obj_arr[size];
unsigned int* count = new unsigned int[1<< mod_pow2 + 1];
for (unsigned int divpow2 = 0; max > (1<< divpow2); divpow2 += mod_pow2){
countrad(arr,aux_arr,size,count,mod_pow2, divpow2);
}
delete[] aux_arr;
delete[] count;
}
template<class obj_arr>
void countrad(obj_arr*& arr, obj_arr*& aux_arr, size_t& size, unsigned int*& count, const unsigned int& mod_pow2, const unsigned int& div_pow2) {
unsigned int i, move,max_mod=1<<mod_pow2;
i = modpow2(max_mod + 1,0,1);
count[0] = 0;
for (; i < max_mod + 1; i += 2) {
count[i] = 0;
count[i + 1] = 0;
}
i = modpow2(size, 0, 1);
count[modpow2(arr[0],div_pow2,mod_pow2)+ 1] += i;
for (; i < size; i += 2) {
count[modpow2(arr[i], div_pow2, mod_pow2) + 1]++;
count[modpow2(arr[i+1], div_pow2, mod_pow2) + 1]++;
}
i = modpow2(max_mod + 2, 0, 1) + 1;
for (; i < max_mod + 1; i += 2) {
count[i] += count[i - 1];
count[i + 1] += count[i];
}
i = modpow2(size, 0, 1);
move = modpow2(arr[0], div_pow2, mod_pow2);
aux_arr[count[move]] = arr[0];
count[move] += i;
for (; i < size; i += 2) {
move = modpow2(arr[i], div_pow2, mod_pow2);
aux_arr[count[move]] = arr[i];
count[move]++;
move = modpow2(arr[i+1], div_pow2, mod_pow2);
aux_arr[count[move]] = arr[i + 1];
count[move]++;
}
i = modpow2(size, 0, 1);
arr[0] = aux_arr[0];
for (; i < size; i += 2) {
arr[i] = aux_arr[i];
arr[i + 1] = aux_arr[i + 1];
}
}
```
-
\$\begingroup\$ I got confuse here and i put MSD insted of LSD \$\endgroup\$bigotes0invisibles– bigotes0invisibles2021年01月31日 18:49:09 +00:00Commented Jan 31, 2021 at 18:49
1 Answer 1
Avoid macros
There is no need to make modpow2
a macro. Make it a function instead:
static constexpr size_t modpow2(size_t num, size_t div_pow2, size_t mod_pow2) {
return (num >> div_pow2) - (num >> (mod_pow2 + div_pow2) << mod_pow2);
}
Avoid forward declarations
You should not need to write forward declarations, unless you have circular dependencies between functions. Otherwise, they are unnecessary code duplication, and there is a potential for mistakes that is better avoided.
Be consistent with the template type
You made your functions templates, so I assume you want to be able to run radix sort on arrays of various types. However, I see code like this:
unsigned int aux = arr[0];
What if I have an array of unsigned long
instead? Or float
? When you make a template, you want to avoid making assumptions about the type of the variables. So either use the template type name:
obj_arr aux = arr[0];
Or better yet, just use auto
when possible:
auto aux = arr[0];
Also use this for the return type. So:
template<class obj_arr>
auto getmax(obj_arr*& arr, size_t& size) {
auto aux = arr[0];
...
return aux;
}
When to pass by reference
You should only pass large objects by reference. Small objects, like pointers and primitive types like int
and size_t
should be passed by value if possible. So:
template<class obj_arr>
auto getmax(obj_arr* arr, size_t size) {
...
}
If you always use references, you risk adding unnecessary pointer indirection to your code.
Use const
where appropriate
You should mark pointers and references as const
when possible, since that might allow the compiler to generate better code, and it will also give a compiler error if you accidentily do modify a value that should be constant. So:
template<class obj_arr>
auto getmax(const obj_arr* arr, size_t size) {
...
}
You can even be more strict and also make the parameter size_t size
const
, and even local variables like max_mod
inside countrad()
.
Use size_t
consistenly for counts and indices
Don't use unsigned int
for counts, unless you are absolutely sure you will never have to deal with arrays with more elements than can be counted in whatever an unsigned int
might be on all the platforms you want to support.
Use idiomatic for
-loops
Why have you moved the initializer statement out of all your for
-loops? In most cases this does not seem necessary at all. Just write for
-loops like you normally do:
auto aux = arr[0];
for (i = modpow2(size, 0, 1); i < size; i += 2) {
aux = arr[i] * (aux < arr[i]) + aux * (aux >= arr[i]);
aux = arr[i + 1] * (aux < arr[i + 1]) + aux * (aux >= arr[i + 1]);
}
Avoid premature optimization
Your implementation of getmax()
seems to be written to avoid conditional jumps. However, many architectures have conditional move instructions that will be able to perform a maximum operation without needing conditional jumps. And this is much cheaper than two multiplication operations. Instead of calling getmax()
, I would just use std::max_element()
like so:
auto max = *std::max_element(arr, arr + size) + 1;
Let the compiler worry about optimizing this. If you really want to be sure, measure the speed of your radix sorting function when using your getmax()
vs. when using std::max_element()
.
Also, in general the compiler will be able to optimize multiplication and division by constants for you, and will convert them to shifts if it sees they are powers of two. However, there are a few places where you call modpow2()
with variable divisors, so it does make sense to explicitly use shifts here.
Use std::unique_ptr
to manage allocated memory
It's best to avoid calling new
and delete
manually, and use smart pointers that do this for you. This avoids potential mistakes such as leaking memory or deleting something twice. If you can use C++14, then use std::make_unique()
to create an array for you, like so:
auto aux_arr = std::make_unique<obj_arr[]>(size);
auto count = std::make_unique<size_t[])(1 << mod_pow2 + 1);
If you are stuck with an older version, you still need to use new
:
std::unqiue_ptr<obj_arr> aux_arr = new obj_arr[size];
std::unique_ptr<size_t> count = new size_t[1 << mod_pow2 + 1];
But at least you don't have to call delete
anymore. You could also use std::vector
for even cleaner looking code, but then you lose a bit of performance due to the initialization of the elements.
Naming things
The names of your variables and functions are inconsistent and sometimes misleading. For example, choose whether you want to use underscores to separate words (which I recommend) or not, and stick with that choice.
The name obj_arr
suggests that the type is an array, but actually it's the type of the array elements. In any case, I would use the more idiomatic T
for this:
template<typename T>
auto getmax(const T* arr, size_t size) {
...
}
The word count
suggests it is a single value, but actually it is an array. I recommend you use the plural form of a noun for arrays, or append _arr
like you do with aux_arr
for example.
-
\$\begingroup\$ thx very much, i learn a lot. \$\endgroup\$bigotes0invisibles– bigotes0invisibles2021年01月29日 02:52:16 +00:00Commented Jan 29, 2021 at 2:52