C++ code to sort a list of elements

Question 1

This is a code uses mergesort algorithm to sort a list of elements.What are the possible improvements on the code?Also are there any unhealthy coding practice involved in this code?

#include<iostream>
#include<vector>
#include<random>
#include<chrono>
using namespace std;
using namespace std::chrono;
vector<int> merge(vector<int> a,vector<int> b){
 int i=0;
 int j=0;
 vector<int> result;
 while(i<a.size()&&j<b.size()){
 if(a[i]<b[j]){
 result.push_back(a[i]);
 i++;
 }
 else{
 result.push_back(b[j]);
 j++;
 }
 }
 while(i<a.size()){
 result.push_back(a[i]);
 i++;
 }
 while(j<b.size()){
 result.push_back(b[j]);
 j++;
 }
 return result;
}
vector<int> mergesort(vector<int> a,int low,int hi){
 vector<int> b,c,result;
 if(low==hi)
 return {a[hi]};
 b=mergesort(a,low,(low+hi)/2);
 c=mergesort(a,(hi+low)/2+1,hi);
 result=merge(b,c);
 return result;
}
int main(){
 std::random_device rd; // obtain a random number from hardware
 std::mt19937 eng(rd()); // seed the generator
 std::uniform_int_distribution<> distr(0,90); // define the range
 vector <int> num;
 for(int i=0;i<30;i++)
 num.push_back(distr(eng)); 
 vector<int> a=mergesort(num,0,num.size()-1);
 for(auto &i:a)
 cout<<i<<" "; 
}

Question 2

#include<iostream>
#include<vector>
#include<random>
#include<chrono>

Leave some space between the #includes and the rest of your code, that's easier to read. Besides, I really like <chrono> myself too, but you don't seem to use it afterwards.

using namespace std;
using namespace std::chrono;

using directives for a whole namespace is a bad idea. You risk conflicts between the names defined in std (and there's quite a lot of them) and those you define yourself. Safety is worth a few more keystrokes.

vector<int> merge(vector<int> a,vector<int> b){

That function signature means you're passing a and b by value, that is to say you copy the argument vectors. It might be the good move sometimes, but here it isn't, because you only read them. So const std::vector<int>& a, const std::vector<int>& b is better.

 int i=0;
 int j=0;

i, j, why not? but it doesn't scream what you'll be using those variables for.

 vector<int> result;
 while(i<a.size()&&j<b.size()){

I would rather use iterators in the first place (I mean, in the function's signature), because:

it avoids comparison issues (vector.size() is unsigned, whereas i and j are signed)
it makes it easier to write and use generic code
it allows you to merge sequences which are only a part of a container, or even inside the same container.
```
if(a[i]<b[j]){
```

There's a trick here: if a[i] == b[j] you'll insert b[j] first. It might be innocuous -it is for an int- but programmers expect that min(a, b) will return a if a == b. Since a sorting algorithm is a good candidate for abstraction (a vector of strings is sorted the same way as a vector of ints or floats), you should respect that rule.

 result.push_back(a[i]);
 i++;

Unless you need to keep track of the old value, use pre-incrementation (++i) since i++ creates and returns a temporary value.

 }
 else{
 result.push_back(b[j]);
 j++;
 }
 }
 while(i<a.size()){
 result.push_back(a[i]);
 i++;
 }
 while(j<b.size()){
 result.push_back(b[j]);
 j++;
 }

copy and paste is an anti-pattern. Standard algorithms will provide what you need in a more elegant way, especially if you use iterators: std::copy(current_a, a.end(), std::back_inserter(result));

 return result;
}
vector<int> mergesort(vector<int> a,int low,int hi){
 vector<int> b,c,result;
 if(low==hi)
 return {a[hi]};
 b=mergesort(a,low,(low+hi)/2);
 c=mergesort(a,(hi+low)/2+1,hi);
 result=merge(b,c);
 return result;
}

This part is all right, even if I believe you should try to reduce the place you need to merge-sort or at least the number of allocations.

int main(){
 std::random_device rd; // obtain a random number from hardware
 std::mt19937 eng(rd()); // seed the generator
 std::uniform_int_distribution<> distr(0,90); // define the range
 vector <int> num;
 for(int i=0;i<30;i++)
 num.push_back(distr(eng)); 
 vector<int> a=mergesort(num,0,num.size()-1);
 for(auto &i:a)

You generally don't take integers by reference (a reference is the size of or bigger than an int, and you don't take non-const references when you don't modify the variable, which you don't.

 cout<<i<<" "; 
}

Question 3

Great answer. One other tip I'd recommend is to reserve space for the result vector, since we know exactly how big it will need to be. result.reserve(a.size() + b.size());

Question 4

@FredLarson: I used to think so too, but then I read Bjarne Stroustrup C++ FAQ: "People sometimes worry about the cost of std::vector growing incrementally. I used to worry about that and used reserve() to optimize the growth. After measuring my code and repeatedly having trouble finding the performance benefits of reserve() in real programs, I stopped using it except where it is needed to avoid iterator invalidation (a rare case in my code). Again: measure before you optimize."

Question 5

Interesting. Looks like I'll have to do some experiments of my own.

Question 6

Thank you so much.Got to learn so much from this answer.Btw i used chrono to get the execution time but forgot to remove it while pasting here.Once again thanks a lot! @papagaga

Question 7

I made a version of this code that sorts strings instead of integers. Using reserve does seem to produce a measurable difference. Feeding it some 300,000 strings, it seems to save about 50ms -- around 160ms vs. 210ms. 50ms doesn't sound like much, but that's over 20%!

Question 8

The functions should be templatized so that they can sort any kind of comparable values.

You have made the two classic mistakes in mergesort:

Unstable sorting.
```
if(a[i]<b[j]){
 result.push_back(a[i]);
 i++;
}
else{
 result.push_back(b[j]);
 j++;
}
```
One of the useful properties of mergesort is that it is a stable sorting algorithm. To achieve that, though, when two a[i] and b[j] are equal, you have to prefer to take the item from a first, because a was originally to the left of b. Therefore, use <= for that comparison.

Of course, stability doesn't matter when sorting integers. But you might as well write it correctly so that it also works for sorting arbitrary objects.
Overflow.
```
b=mergesort(a,low,(low+hi)/2);
c=mergesort(a,(hi+low)/2+1,hi);
```
low+high is vulnerable to integer overflow. The better technique would be low + (high - low) / 2.

Question 9

Merge Sort is a great choice when the data don't fit into working memory. However, our inputs and outputs are all in memory, so we're better off using a more suitable algorithm, such as Quicksort.

Luckily, the C++ Standard Library provides std::sort(), which is intended to be a good choice when inputs fit into memory (it's the implementer's choice of algorithm, not necessarily Quicksort). A good implementation won't require such a large memory overhead as the hand-written sort (we need space for the entire input and output during the sort - i.e. twice that of the elements, leading to an increased risk of std::bad_alloc; most std::sort() require little or no extra memory during operation).

Summary: use std::sort() and save your brainpower for other challenges.

Question 10

Do you mind adding a little more detail about why merge sort is good when it doesn't fit in working memory and why quicksort (or std::sort) is more suitable when it does?

Question 11

@Danno, almost any textbook will explain that; I don't think I could do as good a job. If you want to know more, you could even start with Wikipedia.

papagaga papagaga 5,80712 silver badges25 bronze badges · Answer 1 · 2018-10-09 12:15:11Z

#include<iostream>
#include<vector>
#include<random>
#include<chrono>

Leave some space between the #includes and the rest of your code, that's easier to read. Besides, I really like <chrono> myself too, but you don't seem to use it afterwards.

using namespace std;
using namespace std::chrono;

using directives for a whole namespace is a bad idea. You risk conflicts between the names defined in std (and there's quite a lot of them) and those you define yourself. Safety is worth a few more keystrokes.

vector<int> merge(vector<int> a,vector<int> b){

That function signature means you're passing a and b by value, that is to say you copy the argument vectors. It might be the good move sometimes, but here it isn't, because you only read them. So const std::vector<int>& a, const std::vector<int>& b is better.

 int i=0;
 int j=0;

i, j, why not? but it doesn't scream what you'll be using those variables for.

 vector<int> result;
 while(i<a.size()&&j<b.size()){

I would rather use iterators in the first place (I mean, in the function's signature), because:

it avoids comparison issues (vector.size() is unsigned, whereas i and j are signed)
it makes it easier to write and use generic code
it allows you to merge sequences which are only a part of a container, or even inside the same container.
```
if(a[i]<b[j]){
```

There's a trick here: if a[i] == b[j] you'll insert b[j] first. It might be innocuous -it is for an int- but programmers expect that min(a, b) will return a if a == b. Since a sorting algorithm is a good candidate for abstraction (a vector of strings is sorted the same way as a vector of ints or floats), you should respect that rule.

 result.push_back(a[i]);
 i++;

Unless you need to keep track of the old value, use pre-incrementation (++i) since i++ creates and returns a temporary value.

 }
 else{
 result.push_back(b[j]);
 j++;
 }
 }
 while(i<a.size()){
 result.push_back(a[i]);
 i++;
 }
 while(j<b.size()){
 result.push_back(b[j]);
 j++;
 }

copy and paste is an anti-pattern. Standard algorithms will provide what you need in a more elegant way, especially if you use iterators: std::copy(current_a, a.end(), std::back_inserter(result));

 return result;
}
vector<int> mergesort(vector<int> a,int low,int hi){
 vector<int> b,c,result;
 if(low==hi)
 return {a[hi]};
 b=mergesort(a,low,(low+hi)/2);
 c=mergesort(a,(hi+low)/2+1,hi);
 result=merge(b,c);
 return result;
}

This part is all right, even if I believe you should try to reduce the place you need to merge-sort or at least the number of allocations.

int main(){
 std::random_device rd; // obtain a random number from hardware
 std::mt19937 eng(rd()); // seed the generator
 std::uniform_int_distribution<> distr(0,90); // define the range
 vector <int> num;
 for(int i=0;i<30;i++)
 num.push_back(distr(eng)); 
 vector<int> a=mergesort(num,0,num.size()-1);
 for(auto &i:a)

You generally don't take integers by reference (a reference is the size of or bigger than an int, and you don't take non-const references when you don't modify the variable, which you don't.

 cout<<i<<" "; 
}

Great answer. One other tip I'd recommend is to reserve space for the result vector, since we know exactly how big it will need to be. result.reserve(a.size() + b.size());
@FredLarson: I used to think so too, but then I read Bjarne Stroustrup C++ FAQ: "People sometimes worry about the cost of std::vector growing incrementally. I used to worry about that and used reserve() to optimize the growth. After measuring my code and repeatedly having trouble finding the performance benefits of reserve() in real programs, I stopped using it except where it is needed to avoid iterator invalidation (a rare case in my code). Again: measure before you optimize."
Interesting. Looks like I'll have to do some experiments of my own.
Thank you so much.Got to learn so much from this answer.Btw i used chrono to get the execution time but forgot to remove it while pasting here.Once again thanks a lot! @papagaga
I made a version of this code that sorts strings instead of integers. Using reserve does seem to produce a measurable difference. Feeding it some 300,000 strings, it seems to save about 50ms -- around 160ms vs. 210ms. 50ms doesn't sound like much, but that's over 20%!

200_success 200_success 146k22 gold badges190 silver badges479 bronze badges · Answer 2 · 2018-10-09 14:01:35Z

The functions should be templatized so that they can sort any kind of comparable values.

You have made the two classic mistakes in mergesort:

Unstable sorting.
```
if(a[i]<b[j]){
 result.push_back(a[i]);
 i++;
}
else{
 result.push_back(b[j]);
 j++;
}
```
One of the useful properties of mergesort is that it is a stable sorting algorithm. To achieve that, though, when two a[i] and b[j] are equal, you have to prefer to take the item from a first, because a was originally to the left of b. Therefore, use <= for that comparison.

Of course, stability doesn't matter when sorting integers. But you might as well write it correctly so that it also works for sorting arbitrary objects.
Overflow.
```
b=mergesort(a,low,(low+hi)/2);
c=mergesort(a,(hi+low)/2+1,hi);
```
low+high is vulnerable to integer overflow. The better technique would be low + (high - low) / 2.

Toby Speight Toby Speight 88k14 gold badges104 silver badges325 bronze badges · Answer 3 · 2018-10-09 14:36:50Z

Merge Sort is a great choice when the data don't fit into working memory. However, our inputs and outputs are all in memory, so we're better off using a more suitable algorithm, such as Quicksort.

Luckily, the C++ Standard Library provides std::sort(), which is intended to be a good choice when inputs fit into memory (it's the implementer's choice of algorithm, not necessarily Quicksort). A good implementation won't require such a large memory overhead as the hand-written sort (we need space for the entire input and output during the sort - i.e. twice that of the elements, leading to an increased risk of std::bad_alloc; most std::sort() require little or no extra memory during operation).

Summary: use std::sort() and save your brainpower for other challenges.

Do you mind adding a little more detail about why merge sort is good when it doesn't fit in working memory and why quicksort (or std::sort) is more suitable when it does?
@Danno, almost any textbook will explain that; I don't think I could do as good a job. If you want to know more, you could even start with Wikipedia.

Stack Exchange Network

C++ code to sort a list of elements

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

C++ code to sort a list of elements

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions