Introduction
I started going through some basic algorithms as I'll have algorithms course next semester. The last time I've written any heap operations is around 5 years ago. This time around it took me an hour to debug the case where the right child wouldn't exist, but left child did. Otherwise it was straightforward.
The code below implements four basic operations on a heap:
Push (sift down)
Pop (swap with the last element and then perform push on truncated heap)
Build heap (push one by one starting from the middle, achieving linear complexity)
Sort heap (continuosly pop elements and truncate the heap, pushing greater elements to the end of the underlying container)
The code works on iterator ranges, requires RandomAccessIterator
, Swappable
value types of the iterator type.
Code
#include <vector>
#include <iostream>
#include <random>
#include <unordered_set>
#include <numeric>
#include <algorithm>
std::vector<int> generate_vector(std::size_t size)
{
if (size == 0)
return {};
if (size == 1)
return {0};
static std::mt19937 twister{};
std::vector<int> numbers(size);
std::iota(numbers.begin(), numbers.end(), 0);
std::shuffle(numbers.begin(), numbers.end(), twister);
return numbers;
}
#include <iterator>
namespace shino
{
template <typename RandomAccessIterator>
void push_heap(RandomAccessIterator first,
RandomAccessIterator last,
RandomAccessIterator element)
{
auto dist = std::distance(first, element) + 1;
auto left_child = first + dist * 2 - 1;
if (left_child >= last)
return;
auto right_child = first + dist * 2;
if (right_child >= last)
{
if (*element < *left_child)
{
std::iter_swap(element, left_child);
push_heap(first, last, left_child);
}
return;
}
if (*element >= *left_child and *element >= *right_child)
return;
auto next_location =
(*left_child >= *right_child) ? left_child : right_child;
std::iter_swap(next_location, element);
shino::push_heap(first, last, next_location);
}
template <typename RandomAccessIterator>
void build_heap(RandomAccessIterator first, RandomAccessIterator last)
{
for (auto middle = first + std::distance(first, last) / 2;
middle != first; --middle)
{
shino::push_heap(first, last, middle);
}
shino::push_heap(first, last, first);
}
template <typename RandomAccessIterator>
void pop_heap(RandomAccessIterator first, RandomAccessIterator last)
{
if (std::distance(first, last) < 2)
return;
auto new_last = std::prev(last);
std::iter_swap(first, new_last);
shino::push_heap(first, new_last, first);
}
template <typename RandomAccessIterator>
void sort_heap(RandomAccessIterator first, RandomAccessIterator last)
{
auto current_last = last;
while (first != current_last)
{
shino::pop_heap(first, current_last);
--current_last;
}
}
}
#include <algorithm>
int main()
{
for (std::size_t i = 0; i <= 10'000; ++i)
{
std::vector<int> v(generate_vector(i));
std::cout << "heapifying vector of size " << i << '\n';
shino::build_heap(v.begin(), v.end());
if (not std::is_heap(v.begin(), v.end()))
std::cerr << "incorrect heapifying on size " << i << '\n';
shino::sort_heap(v.begin(), v.end());
if (not std::is_sorted(v.begin(), v.end()))
std::cerr << "incorrect heap sorting on size " << i << '\n';
}
}
Concerns
push_heap
looks very uglyThe control flow is so obfuscated and hard to follow. I couldn't make it better though.
Indexing looks ugly too
One of the weakest point of iterators: algorithms where indexing is important. Is there a way to make it better?
Anything else
-
\$\begingroup\$ Did you spot a bug? I had the feeling that something is wrong, but tests were silent. If there is something wrong with the post, it would be great to get a feedback so I could fix it. \$\endgroup\$Incomputable– Incomputable2018年05月25日 21:53:44 +00:00Commented May 25, 2018 at 21:53
-
\$\begingroup\$ What class are you taking? \$\endgroup\$JDługosz– JDługosz2018年05月25日 22:09:34 +00:00Commented May 25, 2018 at 22:09
-
\$\begingroup\$ @JDługosz, it’s in my university. I can’t recall the exact name, but it is something similar to intro to algorithms \$\endgroup\$Incomputable– Incomputable2018年05月25日 22:11:21 +00:00Commented May 25, 2018 at 22:11
-
\$\begingroup\$ What level? If you already studied them 5 years ago, and you can write that now, you hardly need an "intro"! \$\endgroup\$JDługosz– JDługosz2018年05月25日 22:16:10 +00:00Commented May 25, 2018 at 22:16
-
1\$\begingroup\$ To whoever is voting to close this question: please clarify what should be improved. \$\endgroup\$Mast– Mast ♦2018年05月26日 06:51:37 +00:00Commented May 26, 2018 at 6:51
2 Answers 2
It looks like you know what you’re doing.
One thing I spotted: Don’t write std::iter_swap
quallified like that. You have to use ADL to pick up the right version provided for the template arguments. You need the "two step":
using std::iter_swap;
iter_swap (a,b);
just like with regular swap
.
You don’t need to qualify your own names when you are in that namespace. Or is that intentional to guard against ADL? I never see that technique used.
Check out Catch2 for beefing up your main
into a comprehensive unit test.
generate_vector
is missing out on NVRO due to your precondition tests. Declare numbers
at the top; always return that.
-
\$\begingroup\$ Qualified names and some other weird things are artifacts from multiple improvement iterations (at the start it wasn’t even a template and it lacked namespace). Thanks for Catch2, I’ve heard positive things about it, except compilation times. \$\endgroup\$Incomputable– Incomputable2018年05月25日 22:05:00 +00:00Commented May 25, 2018 at 22:05
-
\$\begingroup\$ Putting
CATCH_CONFIG_MAIN
in a separate CPP file that never changes makes the compilation time OK. My current project takes 1.4 seconds for optimized build. \$\endgroup\$JDługosz– JDługosz2018年05月25日 22:08:42 +00:00Commented May 25, 2018 at 22:08 -
\$\begingroup\$ that looks nice. I’ll give it a shot tomorrow. \$\endgroup\$Incomputable– Incomputable2018年05月25日 22:10:07 +00:00Commented May 25, 2018 at 22:10
-
\$\begingroup\$ I've never seen
iter_swap
used as a customization point, so qualifying it looks perfectly reasonable to me. \$\endgroup\$Rakete1111– Rakete11112018年05月30日 06:59:46 +00:00Commented May 30, 2018 at 6:59 -
\$\begingroup\$ @Rakete1111 I wonder why they would bother with defining a function where using it is longer than not using it (
swap(*a,*b)
vsiter_swap(a,b)
) unless it was there to abstract something, like a special optimal way of swapping when using a specific kind of iterator. What else would it be for? \$\endgroup\$JDługosz– JDługosz2018年05月30日 20:57:56 +00:00Commented May 30, 2018 at 20:57
List all your #include
directives at the top of the file. Then you'd see that you #include <algorithm>
twice.
-
\$\begingroup\$ Does it effect compile time too much? I mean, I know the practice has always been redundant. \$\endgroup\$Aniket Chowdhury– Aniket Chowdhury2018年05月26日 02:03:49 +00:00Commented May 26, 2018 at 2:03
-
1\$\begingroup\$ @AniketChowdhury It will have a very small (possibly minimal) effect on compile time. The exact amount will depend on how that header is implemented. \$\endgroup\$1201ProgramAlarm– 1201ProgramAlarm2018年05月26日 03:33:54 +00:00Commented May 26, 2018 at 3:33
-
\$\begingroup\$ @1201ProgramAlarm, I believe that is an artifact from a debugging session. I spent considerable time on finding a bug in the
shino::push_heap()
. \$\endgroup\$Incomputable– Incomputable2018年05月26日 06:36:44 +00:00Commented May 26, 2018 at 6:36