What changes should I implement in my templated sorter class so that it performs efficiently?

Question 1

I have a college assignment to implement a class called sorter, whose object is a sorted vector (the user is allowed to sort it by the method he/she desires). I have been able to implement it as following. Please review it and suggest me the changes I should make to make it efficient.

template <typename T>
class Sorter {
 std::vector<T> vector;
 int count(T* array) {
 register int i = 0;
 while (*(array + i)) {
 i++;
 }
 i--;
 return i;
 }
 void swap(T& t1, T& t2) {
 T temp = t1;
 t1 = t2;
 t2 = temp;
 }
 void bubbleSort() {
 for (int i = 0; i < vector.size(); i++) {
 for (int j = 0; j < vector.size() - 1; j++) {
 if (vector[j] > vector[j + 1]) {
 swap(vector[j], vector[j + 1]);
 }
 }
 }
 }
 std::vector<T> mergeSort(std::vector<T> vec) {
 int begin = 0;
 int end = vec.size() - 1;
 auto combine = [=](std::vector<T> lv, std::vector<T> rv) {
 std::vector<T> retVec;
 int i = 0;
 int j = 0;
 while (i < lv.capacity() && j < rv.capacity()) {
 if (lv[i] < rv[j]) {
 retVec.push_back(lv[i]);
 i++;
 }
 else {
 retVec.push_back(rv[j]);
 j++;
 }
 }
 while (i < lv.capacity()) {
 retVec.push_back(lv[i]);
 i++;
 }
 while (j < rv.capacity()) {
 retVec.push_back(rv[j]);
 j++;
 }
 return retVec;
 };
 if (vec.size() == 1) {
 return vec;
 }
 int mid = (begin + end) / 2;
 auto it = vec.begin();
 std::vector<T> left = mergeSort(std::vector<T>(it, it + mid + 1));
 std::vector<T> right = mergeSort(std::vector<T>(it + mid + 1, vec.end()));
 return combine(left, right);
 }
 void sort() {
 std::cout << "Choose your option:" << "\n"; 
 std::cout << "1. bubblesort" << "\n";
 std::cout << "2. mergesort" << "\n";
 int op = 2;
 //std::cin >> op;
 switch (op) {
 case 1:
 bubbleSort();
 break;
 case 2:
 vector = mergeSort(vector);
 break;
 }
 }
public:
 Sorter() = delete;
 Sorter(std::vector<T> vec) : vector(vec) {
 sort();
 }
 Sorter(T* array) {
 int num = count(array);
 for (int i = 0; i < num; i++) {
 this->vector.push_back(*(array + i));
 }
 sort();
 }
 std::vector<T> getSorted() {
 return vector;
 }
};

The updated version is always available at this link

Question 2

How much testing of this code have you done? There are several nonobivous problems with it that may be revealed by testing.

Question 3

I still haven't tested it with any object types, like vector or lists.

Question 4

I've refactored the class. And also added a quicksort function. Am I supposed to post it as a new question with this one linking to that (and that one linking to this) or do I update the code here itself?

Question 5

Post your updated code in a new question (since you've already received answers to this one).

Question 6

In the sort() function its currently hardcoded to merge sort which I assume is not intentional. However, if you do allow the user to give an input you should do some error checking. You'll currently get unexpected behaviour if the user enters anything other than 1 or 2.
You should use std::swap rather than implementing your own swap function. Some types can make optimisations over what you've done.
From a design point of view I would argue that c++ is not an inherently object oriented language and that bubblesort and mergesort should be free functions taking a pair of iterators, to allow them to be more easily reused. You've pretty much done this for merge sort already, it uses none of the members of the Sorter class so could just be made a free function.
Your bubble sort implementation is inefficient. It should check if the vector is sorted after each pass, rather than doing the outer loops you could do while(!isSorted(vector)) or similar.
Merge sort has a very big lambda in it for combine. From a readability point of view I would pull that out into a separate function.
Both mergesort and bubblesort use indices with operator[] to get the elements from the vector. The preferred way to do this in c++ is using iterators, see https://stackoverflow.com/questions/131241/why-use-iterators-instead-of-array-indices
In count() register is deprecated (and strictly is removed since c++17). You should probably avoid using it to improve compatibility.
I would need to think about it more, but I question if while (*(array + i)) in count will always behave as expected. You assume that the array has an end sentinel value that will dereference to null. I don't that's guaranteed. As far as I know there is not an easy way to get the size from a c style array. The normal way to handle this is to require the user to pass in the size as well.
Update on capacity(). Your use of capacity() in the merge sort is not correct. You should use size() instead. size() tells you the number of elements currently in the vector whereas capacity() tells you how many elements the vector reserved space for internally. i.e. if you insert more than this number of things into the vector it will internally reserve more space and do reallocate all the elements. It's important to note that the capacity can be greater than the number of elements in the vector. In your case it works ok as the constructor with a pair of iterators initialises the capacity equal to the size. However, I don't think this is guaranteed and is complier dependant. To see the difference try the following code:

 std::vector<int> v;
 std::cout << 0 << " " << v.size() << " " << v.capacity() <<"\n";
 for( int i = 0; i < 10; ++i)
 {
 v.push_back(i);
 std::cout << i << " " << v.size() << " " << v.capacity() <<"\n";
 }
0 0 0
0 1 1
1 2 2
2 3 4
3 4 4
4 5 8
5 6 8
6 7 8
7 8 8
8 9 16
9 10 16

Question 7

while (*(array + i)) is definitely wrong, it assumes there's a sentinel value that can be converted to a false. Definitely not guaranteed, and usually doesn't even compile.

Question 8

(begin + end) / 2 risks overflow, but that's more of an edge case.

Question 9

@nivag, I agree with your opinion completely. However, I always wondered about point 6 you have mentioned, from like when I learned c++. I know that std::vector::begin will return me an iterator pointing to the first element and then, to point to the, umm.. lets say third element (any element in-between actually), I will have to increment the iterator that many times. This is not as simple as saying std::vector<int> v = { 1,2,3,4,5 }; std::cout << v[3]. So, is there any equivalent function in c++ which will return me an iterator directly pointing to nth element? If there is, how do I use it?

Question 10

If you really want a particular element v[3] can be fine, but in your case you are mainly just looping through all the elements. Also vector has a random access iterator so supports e.g. v.begin()+3 if needed. I did mean to say something about capacity but ran out of time yesterday. I'll try and add something later if no one else does.

Question 11

@cpplover: Well, if the array has 1.5 million entries, and begin is 1,499,999,999, and end if 1,500,000,000, then begin+end is 2,999,999,999, but the maximum that fits in an int is 2,147,483,648, so begin+end will result in an unspecified number, usually -852,516,351. Then -852,516,351/2 is -426,258,175, so your code will try to access index -426,258,175 of the array. int mid = begin/2 + end/2; doesn't have this overflow problem.

Question 12

READABILITY

Function: int count(T* array)

I don't prefer this way of acessing a (T) while(*(array+i)), because of readability reasons. Better way: while(array[i])

PERFORMANCE

Function: void swap(T& t1, T& t2)

Faster way is to use XOR (^) swap algorithm. it is faster and small amout of memory is used.

if (t1 != t2)
 {
 *t1 ^= *t2;
 *t2 ^= *t1;
 *t1 ^= *t2;
 }

and function declaration should be inline void swap(T& t1, T& t2)

Function: std::vector<T> getSorted()

Should be inline std::vector<T> getSorted()

Question 13

I wouldn't recommend an xor swap not least because it won't for lots of types and this code looks like it expects to sort a vector of any type. See stackoverflow.com/questions/10549155/… for more reasons.

Question 14

Also there are bigger problems, namely that count usually reads past the end of the array.

Question 15

The XOR algorithm is much slower and uses no less memory than the normal swap. But a bigger problem is that swap(a, a) will result in a = 0. And marking functions defined in the class has zero use.

Question 16

Also, I feel, it is better to call Pointer declared arrays as *(array+i) instead of array[i]. This helps in emphasizing that the array has been declared as a pointer and allocated dynamically, instead of static allocation.

Question 17

Oops, I meant "marking functions defined in the class inline" in my last comment ... Sorry

nivag nivag 4012 silver badges6 bronze badges · Accepted Answer · 2020-03-26 17:44:59Z

In the sort() function its currently hardcoded to merge sort which I assume is not intentional. However, if you do allow the user to give an input you should do some error checking. You'll currently get unexpected behaviour if the user enters anything other than 1 or 2.
You should use std::swap rather than implementing your own swap function. Some types can make optimisations over what you've done.
From a design point of view I would argue that c++ is not an inherently object oriented language and that bubblesort and mergesort should be free functions taking a pair of iterators, to allow them to be more easily reused. You've pretty much done this for merge sort already, it uses none of the members of the Sorter class so could just be made a free function.
Your bubble sort implementation is inefficient. It should check if the vector is sorted after each pass, rather than doing the outer loops you could do while(!isSorted(vector)) or similar.
Merge sort has a very big lambda in it for combine. From a readability point of view I would pull that out into a separate function.
Both mergesort and bubblesort use indices with operator[] to get the elements from the vector. The preferred way to do this in c++ is using iterators, see https://stackoverflow.com/questions/131241/why-use-iterators-instead-of-array-indices
In count() register is deprecated (and strictly is removed since c++17). You should probably avoid using it to improve compatibility.
I would need to think about it more, but I question if while (*(array + i)) in count will always behave as expected. You assume that the array has an end sentinel value that will dereference to null. I don't that's guaranteed. As far as I know there is not an easy way to get the size from a c style array. The normal way to handle this is to require the user to pass in the size as well.
Update on capacity(). Your use of capacity() in the merge sort is not correct. You should use size() instead. size() tells you the number of elements currently in the vector whereas capacity() tells you how many elements the vector reserved space for internally. i.e. if you insert more than this number of things into the vector it will internally reserve more space and do reallocate all the elements. It's important to note that the capacity can be greater than the number of elements in the vector. In your case it works ok as the constructor with a pair of iterators initialises the capacity equal to the size. However, I don't think this is guaranteed and is complier dependant. To see the difference try the following code:

 std::vector<int> v;
 std::cout << 0 << " " << v.size() << " " << v.capacity() <<"\n";
 for( int i = 0; i < 10; ++i)
 {
 v.push_back(i);
 std::cout << i << " " << v.size() << " " << v.capacity() <<"\n";
 }
0 0 0
0 1 1
1 2 2
2 3 4
3 4 4
4 5 8
5 6 8
6 7 8
7 8 8
8 9 16
9 10 16

while (*(array + i)) is definitely wrong, it assumes there's a sentinel value that can be converted to a false. Definitely not guaranteed, and usually doesn't even compile.
(begin + end) / 2 risks overflow, but that's more of an edge case.
@nivag, I agree with your opinion completely. However, I always wondered about point 6 you have mentioned, from like when I learned c++. I know that std::vector::begin will return me an iterator pointing to the first element and then, to point to the, umm.. lets say third element (any element in-between actually), I will have to increment the iterator that many times. This is not as simple as saying std::vector<int> v = { 1,2,3,4,5 }; std::cout << v[3]. So, is there any equivalent function in c++ which will return me an iterator directly pointing to nth element? If there is, how do I use it?
If you really want a particular element v[3] can be fine, but in your case you are mainly just looping through all the elements. Also vector has a random access iterator so supports e.g. v.begin()+3 if needed. I did mean to say something about capacity but ran out of time yesterday. I'll try and add something later if no one else does.
@cpplover: Well, if the array has 1.5 million entries, and begin is 1,499,999,999, and end if 1,500,000,000, then begin+end is 2,999,999,999, but the maximum that fits in an int is 2,147,483,648, so begin+end will result in an unspecified number, usually -852,516,351. Then -852,516,351/2 is -426,258,175, so your code will try to access index -426,258,175 of the array. int mid = begin/2 + end/2; doesn't have this overflow problem.

Stack Exchange Network

What changes should I implement in my templated sorter class so that it performs efficiently?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

What changes should I implement in my templated sorter class so that it performs efficiently?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions