C++ vector bubble sort in my edition

Question 1

I made a small adaptive function that sorts vector:

template<class T>
vector<T> BubbleSort(vector<T> arr, bool (*compare)(T, T)) {
 const int arraySize = arr.size(); //Size of vector. Is it ok to leave it const?
 //int i = 1 - to be able to swap with previous(i-1) element
 for (int i = 1; i < arraySize; i++) {
 if (compare(arr[i], arr[i - 1])) {
 //Swap 2 elements. Is swap() better?
 T helper = arr[i];
 arr[i] = arr[i - 1];
 arr[i - 1] = helper;
 if (i != 1) i -= 2; //Because in the next cycle i will be ++
 }
 }
 return arr;
}

It can be a little confusing, so I wrote some comments...
I also tested it with sort() function, which actually a way more faster. I've tested it on 10000 numbers.

First, for was like for (int i = 0; i < arr.size(); i++). And time of sort was ~5 seconds. When I changed arr.size() to variable, it is now ~4.5 seconds.

...while sort() takes only ~0.04 seconds...

I know sort() may use another sorting algorithm, but could you please show how BubbleSort() function can be more faster?

Edit

Here is part of the code I tested this function:

vector<int> arr;
ifstream dataset("Numbers.txt");
int n;
while (dataset >> n) arr.push_back(n);
//Here comes the test:
clock_t start = clock();
arr = sortfunctions::BubbleSort<int>(arr, [](int a, int b) {
 return a < b;
});
clock_t end = clock();

I checked the result(arr) before and after. It is actually sorted. And if you will look at this result in the console, you can see a visual pattern by the way.

Question 2

This is some strange variation of bubble sort, if it is bubble sort at al. Bubble sort is when the biggest element "floats" to the top, then the second "floats" under it and so on. So, you need two loops: i from 1 to n-1 (including), j from 1 to n-i (including), so [j] and [j-1] gets swapped if needed. In this one you're stepping back when swap occurs - so elements are "drawn" sometimes. I'm not sure if you'll have a reasonable number of comparisons here. If you need performance - you should use the best algorithm available. That's basics. No use of optimizing bad algorithm.

Question 3

If you are interested in optimizing your searches, I suggest you take a look at quicksort, merge sort and heapsort, and maybe a hybrid, such as introsort.

Question 4

Invoking the C++20 CPO std:::swap(a, b) is in general more performant and non-throwing due to specialization. You can still use the two-step using std::swap; swap(a, b); which earlier versions needed for the same advantages. Or even std::iter_swap(&a, &b);.

Question 5

This is an insertion sort: you have a sorted sublist that starts at size 1, and you repeatedly take the next non-sorted element and sort it into its place in the sublist until the entire list is sorted. The main "quirk" making this different from a textbook insertion sort is that you don't keep track of the length of the sorted sublist, so you have to find it again after each insertion.

Question 6

Thanks @trentcl - this is en.wikipedia.org/wiki/Gnome_sort Even changing to bubble sort will speed it up.

Question 7

You understand that making the code faster means reducing the value of k1 in the complexity of k1*n2 while using a different algorithm would change that to k2*n*log(n). But understanding that...

template<class T> vector<T> BubbleSort(vector<T> arr, bool (*compare)(T, T)) {
One speed difference between std::sort and the C library's sort is that the former takes the comparison operation as a functor typed by a template argument. Passing a function-calling-object rather than a pointer to a regular function allows the compiler to inline the call. Generally, compilers won't follow pointers to functions, even when it's clear to the human reader that the specific function is known at the time of that specific call. But passing a functor makes the choice of function part of the type system, and the compiler naturally inlines that. Many function calls to a trivial compare operation will be a lot of overhead.

So, take the comparison operation in the same way that the standard library takes code parameters in sort and any of the standard algorithms, not as a C-style plain pointer to function.

This should be the largest contributor.

In the same signature, I see you are passing arr by value, and then returning a copy of that. The compiler will use the move constructor for the return which will eliminate copying of the dynamically allocated memory. But you are, by design, copying the array and sorting that copy. The std::sort sorts in-place so does not copy the whole thing.

const int arraySize = arr.size(); //Size of vector. Is it ok to leave it const?
Well, does the size change?

//Swap 2 elements. Is swap() better?
Yes! Does it make a difference in your actual benchmark? Well, you never showed the test code so I don't know what type T is. Are they expensive to construct and destruct? Are they very large and thus expensive to copy the bytes around? You are not using move semantics and you are causing a duplicate to exist for a while, and a swap for some type could be optimized to not do any of that. For a string (that's not using SSO in both instances) you will find swap to be much faster, for example.

Your code is using subscripts and a vector, probably porting textbook code that sorts an array. But in C++ we like to use iterators and write a template that works for any kind of sequential container, not just vector. The std::sort can also be called with a subrange of elements in a vector, no problem; you see it's much more versatile. That doesn't affect the performance though. Subscripting a vector is fast.

Are you benchmarking an optimized build? I'm surprised that saving .size() made that much of a difference!

Does it actually work?

I only see one loop, making one pass and bubbling the elements up by one position. It doesn't repeat that pass n times or until no exchanges were needed.

This makes your slow performance rather confusing... if it's just making one pass through the array, it should be instantaneous for only 10000 numbers.

Update: As Pavlo Slavynskyy pointed out in a comment, your code is actually Gnome Sort or stupid sort.

The gnome sort is a sorting algorithm which is similar to insertion sort in that it works with one item at a time but gets the item to the proper place by a series of swaps, similar to a bubble sort. It is conceptually simple, requiring no nested loops. The average running time is O(n2) but tends towards O(n) if the list is initially almost sorted.

Question 8

Regarding .size() - if element type (T) is allowed to alias the size variable (or in vector case the "begin and "end" pointer variables which are used to calculate size), then a loop that has ".size()" in the header will re-load the size on every iteration (instead of keeping it in a register), because compiler needs to produce correct code for the event of that size being overwritten. Example: godbolt.org/z/deofze8va

Question 9

Do you have a reference that return arr; and return std::move(arr) are different? I understand that the copy-elision doesn't apply, but surprised that returning a local copy isn't allowed to treat it as an rvalue. In my test, the resultant code is identical, and the only difference is that the std::move() version causes a GCC warning about the redundant move operation.

Question 10

@TobySpeight I see that return does automatically move from parameters. I must have been confusing it with something else (NRVO?) or the CPPCON presentation I remember watching might be slightly different from the final standard.

Question 11

Learn something new every day. Never heard of "Gnome" sort before. Bit cruel calling it stupid sort when it has a best case of n and a very low k value. I like it. Obviously bad for large data sets but should be good for small sets (where the n^2 has not overwhelmed the k value).

Question 12

It's stupid because it forgets where it left off. Introducing a variable would turn it into an insertion sort. Swapping and comparing every element is only better than doing a binary search and bulk copying of the elements to be moved when the number of elements to move is about 5, for such simple numeric type elements.

JDługosz JDługosz 11.7k19 silver badges40 bronze badges · Accepted Answer · 2021-06-17 14:53:35Z

You understand that making the code faster means reducing the value of k1 in the complexity of k1*n2 while using a different algorithm would change that to k2*n*log(n). But understanding that...

template<class T> vector<T> BubbleSort(vector<T> arr, bool (*compare)(T, T)) {
One speed difference between std::sort and the C library's sort is that the former takes the comparison operation as a functor typed by a template argument. Passing a function-calling-object rather than a pointer to a regular function allows the compiler to inline the call. Generally, compilers won't follow pointers to functions, even when it's clear to the human reader that the specific function is known at the time of that specific call. But passing a functor makes the choice of function part of the type system, and the compiler naturally inlines that. Many function calls to a trivial compare operation will be a lot of overhead.

So, take the comparison operation in the same way that the standard library takes code parameters in sort and any of the standard algorithms, not as a C-style plain pointer to function.

This should be the largest contributor.

In the same signature, I see you are passing arr by value, and then returning a copy of that. The compiler will use the move constructor for the return which will eliminate copying of the dynamically allocated memory. But you are, by design, copying the array and sorting that copy. The std::sort sorts in-place so does not copy the whole thing.

const int arraySize = arr.size(); //Size of vector. Is it ok to leave it const?
Well, does the size change?

//Swap 2 elements. Is swap() better?
Yes! Does it make a difference in your actual benchmark? Well, you never showed the test code so I don't know what type T is. Are they expensive to construct and destruct? Are they very large and thus expensive to copy the bytes around? You are not using move semantics and you are causing a duplicate to exist for a while, and a swap for some type could be optimized to not do any of that. For a string (that's not using SSO in both instances) you will find swap to be much faster, for example.

Your code is using subscripts and a vector, probably porting textbook code that sorts an array. But in C++ we like to use iterators and write a template that works for any kind of sequential container, not just vector. The std::sort can also be called with a subrange of elements in a vector, no problem; you see it's much more versatile. That doesn't affect the performance though. Subscripting a vector is fast.

Are you benchmarking an optimized build? I'm surprised that saving .size() made that much of a difference!

Does it actually work?

I only see one loop, making one pass and bubbling the elements up by one position. It doesn't repeat that pass n times or until no exchanges were needed.

This makes your slow performance rather confusing... if it's just making one pass through the array, it should be instantaneous for only 10000 numbers.

Update: As Pavlo Slavynskyy pointed out in a comment, your code is actually Gnome Sort or stupid sort.

The gnome sort is a sorting algorithm which is similar to insertion sort in that it works with one item at a time but gets the item to the proper place by a series of swaps, similar to a bubble sort. It is conceptually simple, requiring no nested loops. The average running time is O(n2) but tends towards O(n) if the list is initially almost sorted.

Regarding .size() - if element type (T) is allowed to alias the size variable (or in vector case the "begin and "end" pointer variables which are used to calculate size), then a loop that has ".size()" in the header will re-load the size on every iteration (instead of keeping it in a register), because compiler needs to produce correct code for the event of that size being overwritten. Example: godbolt.org/z/deofze8va
Do you have a reference that return arr; and return std::move(arr) are different? I understand that the copy-elision doesn't apply, but surprised that returning a local copy isn't allowed to treat it as an rvalue. In my test, the resultant code is identical, and the only difference is that the std::move() version causes a GCC warning about the redundant move operation.
@TobySpeight I see that return does automatically move from parameters. I must have been confusing it with something else (NRVO?) or the CPPCON presentation I remember watching might be slightly different from the final standard.
Learn something new every day. Never heard of "Gnome" sort before. Bit cruel calling it stupid sort when it has a best case of n and a very low k value. I like it. Obviously bad for large data sets but should be good for small sets (where the n^2 has not overwhelmed the k value).
It's stupid because it forgets where it left off. Introducing a variable would turn it into an insertion sort. Swapping and comparing every element is only better than doing a binary search and bulk copying of the elements to be moved when the number of elements to move is about 5, for such simple numeric type elements.

Stack Exchange Network

C++ vector bubble sort in my edition

Edit

1 Answer 1

Does it actually work?

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

C++ vector bubble sort in my edition

Edit

1 Answer 1

Does it actually work?

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions