Back from a long C++ hiatus. I thought to implement mergesort from memory, using containers and not based on CLRS's pseudocode and arrays. The compiles and runs ok on the test cases.
/* TODOs
* Optimizations:
* [1] Have Mergesort sort values in-place: leftVec and rightVec contains references
*
* Improvements:
* [1] Have Mergesort take template arguments, allowing it to sort user classes
*/
#include "../template/template.h"
// Forward declarations
vector<int> Merge(const vector<int>& leftHalf, const vector<int>& rightHalf, int indentLevel);
vector<int> Mergesort(vector<int>& vals, int clogLevel = 0)
{
// Base case is vals.size >= 2
if (vals.size() <= 1) return vals;
// Divide vals into two sub-containers: leftHalf, rightHalf
int r = (vals.size() / 2);
auto leftHalf = vector<int>(vals.begin(), vals.begin() + r);
auto rightHalf = vector<int>(vals.begin() + r, vals.end());
// Debug print
//clog("leftHalf: " + StrItems(leftHalf, false), true, clogLevel);
//clog("rightHalf: " + StrItems(rightHalf, false), true, clogLevel);
auto sortedLeft = Mergesort(leftHalf, clogLevel + 1);
auto sortedRight = Mergesort(rightHalf, clogLevel + 1);
auto sorted = Merge(sortedLeft, sortedRight, clogLevel);
//clog("Sorted: " + StrItems(sorted, false), true, clogLevel);
return sorted;
}
// Returns a vector containing elements from leftHalf and rightHalf in ascending value
vector<int> Merge(const vector<int>& leftHalf, const vector<int>& rightHalf, int clogLevel = 0)
{
// If leftHalf is empty, returning an empty (or non-empty) rightHalf is what user would expect
if (leftHalf.begin() == leftHalf.end()) return rightHalf;
if (rightHalf.begin() == rightHalf.end()) return leftHalf; // Vice-versa
//clog("Merging: leftHalf & rightHalf; sizes " + to_string(leftHalf.size()) + "," + to_string(rightHalf.size()), true, clogLevel);
auto mergedVec = vector<int>();
auto leftItr = leftHalf.begin();
auto rightItr = rightHalf.begin();
while (leftItr != leftHalf.end() && rightItr != rightHalf.end()) {
if (*leftItr < *rightItr) {
mergedVec.push_back(*leftItr);
leftItr++;
} else if (*leftItr > *rightItr) {
mergedVec.push_back(*rightItr);
rightItr++;
} else {
// Both elements are equal: append both elements
mergedVec.push_back(*leftItr);
mergedVec.push_back(*rightItr);
leftItr++;
rightItr++;
}
}
// If both vectors are exhausted of elements, return
if (leftItr == leftHalf.end() && rightItr == rightHalf.end())
return mergedVec;
// If leftHalf is exhausted, append the rest of elements from rightHalf; vice-versa
if (leftItr == leftHalf.end())
mergedVec.insert(mergedVec.end(), rightItr, rightHalf.end());
else
mergedVec.insert(mergedVec.end(), leftItr, leftHalf.end());
return mergedVec;
}
int main(int argc, char **argv)
{
vector<int> Test1 {-1, 10, -2, 4, -5, 1, 3, 5};
cout << "Test1 (before sort): " << StrItems(Test1, true);
auto Result1 = Mergesort(Test1);
cout << "Test1 (after sort): " << StrItems(Result1, true);
vector<int> Test2 {3, -2, 3, 3, 0};
cout << "Test2: (before sort): " << StrItems(Test2, true);
auto Result2 = Mergesort(Test2);
cout << "Test2: (after sort): " << StrItems(Result2, true);
return 0;
}
Template.h (Boilerplate)
#ifndef TEMPLATE_H
#define TEMPLATE_H
#include <iostream>
#include <vector>
#include <array>
using std::string;
using std::to_string;
using std::cout;
using std::vector;
using std::array;
// Returns a string representation of the container as a space-separated concatenation of its
// elements. If newline is true, appends newline to the end of the string. The string is
// preceded by (indentLevel * indentWidth) number of indentChars.
template <class T>
string StrItems(T container, bool newline = true, int indentLevel = 0, int indentWidth = 2, char indentChar = ' ')
{
string repr = string(indentWidth * indentLevel, indentChar);
for (auto it = container.begin(); it != container.end(); it++) {
repr.append(to_string(*it) + " ");
}
if (newline)
repr.back() = '\n';
else
repr.erase(repr.end() - 1); // Removes the trailing space
return repr;
}
// Console-log. Writes msg to console. If newline is true (default), appends newline to the end of the
// string. The msg is preceded by (indentLevel * indentWidth) number of indentChars.
void clog(const string& msg, bool newline = true, int indentLevel = 0, int indentWidth = 2, char indentChar = ' ')
{
string indentation = string(indentWidth * indentLevel, indentChar);
cout << indentation << msg;
if (newline) cout << '\n';
}
#endif
-
2\$\begingroup\$ Whereas your generic review question is basically on-topic, those specific questions (1-5) are not - to have them answered specifically, you would need to approach a different StackExchange site. \$\endgroup\$Reinderien– Reinderien2020年07月07日 15:48:33 +00:00Commented Jul 7, 2020 at 15:48
-
\$\begingroup\$ Questions removed. \$\endgroup\$Minh Tran– Minh Tran2020年07月07日 19:00:30 +00:00Commented Jul 7, 2020 at 19:00
2 Answers 2
Don't using std
...especially in a header. In a source file it's not so bad; but in a header, you're forcing anyone who includes it to pollute their namespace.
Const references
Since Mergesort
is not in-place, vals
should be passed as const
.
Make your tests tests
The tests currently in main should have assert
s so that they act as actual tests.
Your questions
By having
Merge
take references toleftHalf
andrightHalf
, I'm working with whatever memory was allocated (in this case, stack memory fromMergesort
) and not copies of those vectors right?
You're correct that Merge
will not make copies at the beginning of the call due to the reference. However, accepting a reference does not guarantee that the referred variable was allocated on the stack, nor should that matter to the function.
Lastly, I don't need to worry about free-ing
leftHalf
,rightHalf
,sortedLeft
,sortedRight
,sorted
, andmergedVec
because they're allocated on the stack and returned by value, right?
Right(ish). Even if you did need to free the memory, you wouldn't use free()
- this is C++, so you would use delete
.
Is there a way to check which region of memory an object lies in (e.g., stack, heap, global, etc.)?
You should never need to do this outside of maybe a very narrow and non-production debugging or profiling effort.
we can say the address range 0x4FFFFFFFDDDDDDDD to 0x5000000000000000 is always where a program stores stack frames
Absolutely not, and this is dependent on a number of things, including OS and processor (32-bit vs. 64-bit).
Some operating systems go out of their way to randomize this range to make certain exploits more difficult.
-
-
\$\begingroup\$
Since Mergesort is not in-place, vals should be passed as const
. Fair enough. I didn't have a good reason for making valsconst
but I suppose it's better to be conservative until you have a good reason to keep it non-const (like implementing in-place sorting). \$\endgroup\$Minh Tran– Minh Tran2020年07月08日 12:40:18 +00:00Commented Jul 8, 2020 at 12:40 -
\$\begingroup\$
The tests currently in main should have asserts so that they act as actual tests.
If I were writing unit tests, I'd look for or write my own method that checks two containers for element-wise equality. I thought it'd be overkill for this example. \$\endgroup\$Minh Tran– Minh Tran2020年07月08日 12:43:09 +00:00Commented Jul 8, 2020 at 12:43 -
\$\begingroup\$
Right(ish). Even if you did need to free the memory, you wouldn't use free() - this is C++, so you would use delete.
Good catch. Yep -- it'snew/delete
notmalloc/free
. Also, I recently read SO: Do vectors use stack or heap memory and it seems the container elements are actually created on the heap but the class cleans up itself when it goes out of scope. \$\endgroup\$Minh Tran– Minh Tran2020年07月08日 12:50:00 +00:00Commented Jul 8, 2020 at 12:50 -
\$\begingroup\$
You should never need to do this outside of maybe a very narrow and non-production debugging or profiling effort.
I wanted to use it for educational purposes, not for development reasons. \$\endgroup\$Minh Tran– Minh Tran2020年07月08日 12:52:04 +00:00Commented Jul 8, 2020 at 12:52
Overview
A couple of things I would note:
1: Your code only works for integers. Why do you care? You have shown you know about templates so it would be reasonable to make this sort any type of object that is comparable.
I suppose you acknowledge this in the comments:
// Have Mergesort take template arguments, allowing it to sort user classes
But when you try this it will throw up questions around copy (the default and not bad for integers) against move (A better idea for complex/expensive objects).
2: Why only vectors. It is fine to internally use vectors for storage for the intermediate results. But the interface should not limit you to sorting one specific type of container.
Now you could templatize on the container type. But usually in C++ we abstract the container from the algorithm by using iterators. So I would have used iterators as the input to the sort (so that you can use any container simply passing the iterators to the algorithm).
3: You use an "Extra" 2x
the input size of memory compared the input data as temporary storage. You can do this with only 1x
extra memory and with a bit of work you only do the allocation once (doing the allocation again and again can be expensive).
4: You return a new array (with the sorted content). Why not sort the content in place. You don't need to force the creation of a new container. If the original user wants a new container then the user can make a copy and then use a sort in place algorithm. I think the creation of a new container is an extra unnecessary step that you are forcing your user to pay for that they may not want to.
You sort of mention this as an improvement:
// Have Mergesort sort values in-place: leftVec and rightVec contains references
But I don't think you can have l/r Vec be references when you do this.
Comments
Back from a long C++ hiatus.
Welcome Back.
I thought to implement mergesort from memory
Its a fun learning example. I like bubble sort my self.
using containers and not based on CLRS's pseudocode and arrays.
I had to look up what CLRS mean. You learn something new every day.
The compiles and runs ok on the test cases.
Good. That means you read the rules :-)
Code review:
If you are not modifying the orignal pass by const reference to catch mistakes.
vector<int> Mergesort(vector<int> const& vals, int clogLevel = 0)
^^^^^
auto leftHalf = vector(vals.begin(), vals.begin() + r); auto rightHalf = vector(vals.begin() + r, vals.end());
Worth mentioning this is a copy operation. For anything more complex a move would be better (but that would also require the original be modifiable which suggests a merge sort in place).
Note: There are specific move iterators that you can use.
Remove Dead code:
// Debug print
//clog("leftHalf: " + StrItems(leftHalf, false), true, clogLevel);
//clog("rightHalf: " + StrItems(rightHalf, false), true, clogLevel);
That is what source control is for.
How I hate redundant comments.
// Returns a vector containing elements from leftHalf and rightHalf in ascending value
Don't explain what the code should do. That should be done using self documenting code (function/variable names). Your comments should explain things that are not easy to see be captured by the code (WHY).
The problem is that comments rot over time. So they need to be maintained with the code. If your comments rot to a point where they don't reflect the code which is wrong? What do you fix. So let the code explain how it does it let the comments explain why (or things that are not obvious in the code).
Why are you testing the iterators here?
if (leftHalf.begin() == leftHalf.end()) return rightHalf;
if (rightHalf.begin() == rightHalf.end()) return leftHalf; // Vice-versa
I think the more meaningful test is to simply test to see if it is empty.
if (leftHalf.empty()) return rightHalf;
if (rightHalf.empty()) return leftHalf;
I believe that conveys the intent of the code much more clearly.
I would simplify this:
if (*leftItr < *rightItr) {
mergedVec.push_back(*leftItr);
leftItr++;
} else if (*leftItr > *rightItr) {
mergedVec.push_back(*rightItr);
rightItr++;
} else {
// Both elements are equal: append both elements
mergedVec.push_back(*leftItr);
mergedVec.push_back(*rightItr);
leftItr++;
rightItr++;
}
// Like this:
if (*leftItr <= *rightItr) {
mergedVec.push_back(*leftItr);
++leftItr;
}
else {
mergedVec.push_back(*rightItr);
++rightItr;
}
// PS:
// prefer pre-increment over post.
++rightItr;
Most of the time they are equivalent. But now and then the pre-increment is slightly more efficient. Based on the standard way of implementing it.
see: How to overload the operator++ in two different ways for postfix a++ and prefix ++a?
Again you are complicating this.
// If both vectors are exhausted of elements, return
if (leftItr == leftHalf.end() && rightItr == rightHalf.end())
return mergedVec;
// If leftHalf is exhausted, append the rest of elements from rightHalf; vice-versa
if (leftItr == leftHalf.end())
mergedVec.insert(mergedVec.end(), rightItr, rightHalf.end());
else
mergedVec.insert(mergedVec.end(), leftItr, leftHalf.end());
// Simplify like this:
mergedVec.insert(mergedVec.end(), rightItr, rightHalf.end());
mergedVec.insert(mergedVec.end(), leftItr, leftHalf.end());
Yes one of those vectors will be empty. But inserting an empty range is not going to have a cost.
Tiny bit generic for guards.
#ifndef TEMPLATE_H
#define TEMPLATE_H
Put your code into your own namespace. Then add your namespace as part of your include guard.
Never do this.
using std::string;
using std::to_string;
using std::cout;
using std::vector;
using std::array;
Its bad in a source file. In a header file you can break other peoples code. Its simpler to just always use the prefix std::
(its only 5 more characters). Don't be lazy.
string repr = string(indentWidth * indentLevel, indentChar);
Sure you can build a string using append and addition. But I would personally use a std::stringstream
. Just like a stream but does it into a string. Nice for building printable objects.
A recent addition to the C++ language is range based for:
for (auto it = container.begin(); it != container.end(); it++) {
repr.append(to_string(*it) + " ");
}
This can be written as:
for(auto const& val: container) {
repr.append(to_string(val)) + " ");
}
The range based for uses std::begin()
and std::end()
on the container
object and assigns the result of the de-referenced object to the val
.
for(Type val: container) {
<CODE>
}
Is syntactically equivalent to:
{
ContainerType::iterator end = std::end(container);
ContainerType::iterator loop = std::begin(container);
for(;loop != end; ++loop) {
Type Val = *loop;
<CODE>
}
}
Example
I have done a previosu code review on merge sort.
https://codereview.stackexchange.com/a/137939/507
At the end of my answer I provide a good implementation.
-
\$\begingroup\$
1. Your code only works for integers.
If I knew how to apply templates, I would have done so :-). I thought to improve this code incrementally -- make it work for ints,git commit
, make it sort in place,git commit
, make it work for any user types, etc. Copy/move constructors are one of those concepts I haven't learned yet. Will get to it soon, though. Cherno has some good tuts for this. \$\endgroup\$Minh Tran– Minh Tran2020年07月08日 12:55:06 +00:00Commented Jul 8, 2020 at 12:55 -
\$\begingroup\$
2. I would have used iterators as the input to the sort (so that you can use any container simply passing the iterators to the algorithm).
Good idea. It sounds like this advice goes in tandem with advice #1. \$\endgroup\$Minh Tran– Minh Tran2020年07月08日 13:00:51 +00:00Commented Jul 8, 2020 at 13:00 -
\$\begingroup\$
make vals const to prevent inadvertently modifying its value.
Fair point. \$\endgroup\$Minh Tran– Minh Tran2020年07月08日 13:03:06 +00:00Commented Jul 8, 2020 at 13:03 -
\$\begingroup\$
// Returns a vector containing elements from leftHalf and rightHalf in ascending value
Oh common. It wasn't that bad! It was meant to be a brief doc string because I figured just by looking at the signature ofMerge
, it wasn't immediately obvious whether the function returned sorted values in ascending or descending order. I suppose renaming the methodMergeAscending()
orMerge(bool ascending)
would have been just as clear. \$\endgroup\$Minh Tran– Minh Tran2020年07月08日 13:11:04 +00:00Commented Jul 8, 2020 at 13:11 -
\$\begingroup\$
Your comments should explain things that are not easy to see be captured by the code (WHY).
I'm familiar with the coding guideline that advocates explaining why code does something and not how it's done. Stroustrup has a nice note on this @1:01:54. I'd never write a comment that could be ascertained by simply reading the code. However, it's a judgement call -- I've seen functions that read as sentences (i.e., unit test functions), which can look just as bad. \$\endgroup\$Minh Tran– Minh Tran2020年07月08日 13:17:12 +00:00Commented Jul 8, 2020 at 13:17