I made a small adaptive function that sorts vector:
template<class T>
vector<T> BubbleSort(vector<T> arr, bool (*compare)(T, T)) {
const int arraySize = arr.size(); //Size of vector. Is it ok to leave it const?
//int i = 1 - to be able to swap with previous(i-1) element
for (int i = 1; i < arraySize; i++) {
if (compare(arr[i], arr[i - 1])) {
//Swap 2 elements. Is swap() better?
T helper = arr[i];
arr[i] = arr[i - 1];
arr[i - 1] = helper;
if (i != 1) i -= 2; //Because in the next cycle i will be ++
}
}
return arr;
}
It can be a little confusing, so I wrote some comments...
I also tested it with sort()
function, which actually a way more faster. I've tested it on 10000 numbers.
First, for
was like for (int i = 0; i < arr.size(); i++)
. And time of sort was ~5 seconds. When I changed arr.size()
to variable, it is now ~4.5 seconds.
...while sort()
takes only ~0.04 seconds...
I know sort()
may use another sorting algorithm, but could you please show how BubbleSort()
function can be more faster?
Edit
Here is part of the code I tested this function:
vector<int> arr;
ifstream dataset("Numbers.txt");
int n;
while (dataset >> n) arr.push_back(n);
//Here comes the test:
clock_t start = clock();
arr = sortfunctions::BubbleSort<int>(arr, [](int a, int b) {
return a < b;
});
clock_t end = clock();
I checked the result(arr
) before and after. It is actually sorted. And if you will look at this result in the console, you can see a visual pattern by the way.
1 Answer 1
You understand that making the code faster means reducing the value of k1 in the complexity of k1*n2
while using a different algorithm would change that to k2*n*log(n)
. But understanding that...
template<class T> vector<T> BubbleSort(vector<T> arr, bool (*compare)(T, T)) {
One speed difference between std::sort
and the C library's sort is that the former takes the comparison operation as a functor typed by a template argument.
Passing a function-calling-object rather than a pointer to a regular function allows the compiler to inline the call. Generally, compilers won't follow pointers to functions, even when it's clear to the human reader that the specific function is known at the time of that specific call. But passing a functor makes the choice of function part of the type system, and the compiler naturally inlines that.
Many function calls to a trivial compare operation will be a lot of overhead.
So, take the comparison operation in the same way that the standard library takes code parameters in sort
and any of the standard algorithms, not as a C-style plain pointer to function.
This should be the largest contributor.
In the same signature, I see you are passing arr
by value, and then returning a copy of that. The compiler will use the move
constructor for the return
which will eliminate copying of the dynamically allocated memory.
But you are, by design, copying the array and sorting that copy. The std::sort
sorts in-place so does not copy the whole thing.
const int arraySize = arr.size(); //Size of vector. Is it ok to leave it const?
Well, does the size change?
//Swap 2 elements. Is swap() better?
Yes! Does it make a difference in your actual benchmark? Well, you never showed the test code so I don't know what type T
is. Are they expensive to construct and destruct? Are they very large and thus expensive to copy the bytes around? You are not using move
semantics and you are causing a duplicate to exist for a while, and a swap
for some type could be optimized to not do any of that. For a string
(that's not using SSO in both instances) you will find swap
to be much faster, for example.
Your code is using subscripts and a vector
, probably porting textbook code that sorts an array. But in C++ we like to use iterators and write a template that works for any kind of sequential container, not just vector
. The std::sort
can also be called with a subrange of elements in a vector, no problem; you see it's much more versatile.
That doesn't affect the performance though. Subscripting a vector
is fast.
Are you benchmarking an optimized build? I'm surprised that saving .size()
made that much of a difference!
Does it actually work?
I only see one loop, making one pass and bubbling the elements up by one position. It doesn't repeat that pass n times or until no exchanges were needed.
This makes your slow performance rather confusing... if it's just making one pass through the array, it should be instantaneous for only 10000 numbers.
Update: As Pavlo Slavynskyy pointed out in a comment, your code is actually Gnome Sort or stupid sort.
The gnome sort is a sorting algorithm which is similar to insertion sort in that it works with one item at a time but gets the item to the proper place by a series of swaps, similar to a bubble sort. It is conceptually simple, requiring no nested loops. The average running time is O(n2) but tends towards O(n) if the list is initially almost sorted.
-
1\$\begingroup\$ Regarding
.size()
- if element type (T) is allowed to alias the size variable (or in vector case the "begin and "end" pointer variables which are used to calculate size), then a loop that has ".size()" in the header will re-load the size on every iteration (instead of keeping it in a register), because compiler needs to produce correct code for the event of that size being overwritten. Example: godbolt.org/z/deofze8va \$\endgroup\$stepan– stepan2021年06月17日 16:40:52 +00:00Commented Jun 17, 2021 at 16:40 -
1\$\begingroup\$ Do you have a reference that
return arr;
andreturn std::move(arr)
are different? I understand that the copy-elision doesn't apply, but surprised that returning a local copy isn't allowed to treat it as an rvalue. In my test, the resultant code is identical, and the only difference is that thestd::move()
version causes a GCC warning about the redundant move operation. \$\endgroup\$Toby Speight– Toby Speight2021年06月17日 16:44:57 +00:00Commented Jun 17, 2021 at 16:44 -
1\$\begingroup\$ @TobySpeight I see that return does automatically move from parameters. I must have been confusing it with something else (NRVO?) or the CPPCON presentation I remember watching might be slightly different from the final standard. \$\endgroup\$JDługosz– JDługosz2021年06月18日 13:54:20 +00:00Commented Jun 18, 2021 at 13:54
-
1\$\begingroup\$ Learn something new every day. Never heard of "Gnome" sort before. Bit cruel calling it stupid sort when it has a best case of
n
and a very low k value. I like it. Obviously bad for large data sets but should be good for small sets (where the n^2 has not overwhelmed the k value). \$\endgroup\$Loki Astari– Loki Astari2021年06月18日 14:49:12 +00:00Commented Jun 18, 2021 at 14:49 -
\$\begingroup\$ It's stupid because it forgets where it left off. Introducing a variable would turn it into an insertion sort. Swapping and comparing every element is only better than doing a binary search and bulk copying of the elements to be moved when the number of elements to move is about 5, for such simple numeric type elements. \$\endgroup\$JDługosz– JDługosz2021年06月18日 15:22:18 +00:00Commented Jun 18, 2021 at 15:22
std:::swap(a, b)
is in general more performant and non-throwing due to specialization. You can still use the two-stepusing std::swap; swap(a, b);
which earlier versions needed for the same advantages. Or evenstd::iter_swap(&a, &b);
. \$\endgroup\$