This is my first attempt at implementing a heap data structure and I've decided to cover the methods Peek
, Pop
, Push
, and Replace
. Think appear to work correctly but I admittedly had a lot of trouble figuring out the "ideal" way to implement each function as the reference implementations I found were all over the place.
Any feedback, especially if it addresses concerns that aren't related to formatting/naming, is most appreciated.
public class BinaryHeap<T>
{
private readonly Func<T, T, bool> m_comparer;
private readonly IList<T> m_values;
private int m_nextIndex;
public int Capacity => m_values.Count;
public int Count => m_nextIndex;
public BinaryHeap(IList<T> values, Func<T, T, bool> comparer) {
m_comparer = comparer;
m_values = values;
Grow(checked((int)(Operations.NextPowerOf2(Capacity) - 1)));
Rebuild();
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public T Peek() {
return m_values[0];
}
public T Pop() {
if (TryPop(out T value)) {
return value;
}
else {
throw new InvalidOperationException(message: "heap is empty");
}
}
public int Push(T value) {
var offset = Count;
if (!(offset < Capacity)) {
Grow(checked((int)(Operations.NextPowerOf2(Capacity + 1) - 1)));
}
while (0 < offset) {
var parent = ((offset - 1) >> 1);
if (m_comparer(m_values[parent], value)) {
break;
}
m_values[offset] = m_values[parent];
offset = parent;
}
m_values[offset] = value;
return m_nextIndex++;
}
public T Replace(T newValue) {
if (TryReplace(newValue, out T oldValue)) {
return oldValue;
}
else {
throw new InvalidOperationException(message: "heap is empty");
}
}
public bool TryReplace(T newValue, out T oldValue) {
if (0 < Count) {
oldValue = Peek();
m_values[0] = newValue;
Heapify(0);
return true;
}
else {
oldValue = default;
return false;
}
}
public bool TryPop(out T value) {
if (0 < Count) {
value = Peek();
m_values[0] = m_values[--m_nextIndex];
m_values[m_nextIndex] = default;
Heapify(0);
return true;
}
else {
value = default;
return false;
}
}
private void Grow(int maxCapacity) {
var currentCapacity = Capacity;
for (var i = currentCapacity; (i < maxCapacity); i++) {
m_values.Add(default);
}
m_nextIndex = currentCapacity;
}
private void Heapify(int offset) {
var count = Count;
while (offset < count) {
var left = ((offset << 1) + 1);
var right = (left + 1);
var parent = offset;
if ((left < count) && m_comparer(m_values[left], m_values[parent])) {
parent = left;
}
if ((right < count) && m_comparer(m_values[right], m_values[parent])) {
parent = right;
}
if (offset == parent) {
return;
}
Swap(ref offset, ref parent);
}
}
private void Rebuild() {
for (var i = ((Count - 1) >> 1); (-1 < i); i--) {
Heapify(i);
}
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private void Swap(ref int x, ref int y) {
var temp = m_values[x];
m_values[x] = m_values[y];
m_values[y] = temp;
x = y;
}
}
As requested, here are the missing method(s):
public static class Operations
{
public static long NextPowerOf2(long value) {
return ((long)NextPowerOf2((ulong)value));
}
public static ulong NextPowerOf2(ulong value) {
return (FoldBits(value) + 1);
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static ulong FoldBits(ulong value) {
value |= (value >> 1);
value |= (value >> 2);
value |= (value >> 4);
value |= (value >> 8);
value |= (value >> 16);
return (value | (value >> 32));
}
}
4 Answers 4
All in all it seems to work all right.
Personally I don't like constructs like:
if (0 < Count) {...}
but:
if (Count > 0) {...}
And this:
if (!(offset < Capacity))
is less readable than:
if (offset >= Capacity)
If I interpret Operations.NextPowerOf2
in the right way, I think it makes the heap capacity grow unnecessarily for larger number of existing data. Istead I think, I would have a constant value to expand the heap with.
In the below the outer parentheses are redundant:
var left = ((offset << 1) + 1); var right = (left + 1);
for (var i = ((Count - 1) >> 1); (-1 < i); i--)
private void Grow(int maxCapacity) { var currentCapacity = Capacity; for (var i = currentCapacity; i < maxCapacity; i++) { m_values.Add(default); } m_nextIndex = currentCapacity; }
In Grow()
, I don't think that m_nextIndex
should change (and accidentally it won't here) . Grow()
should only change the capacity - not the state of the heap.
Because you are not interested in feedback about naming, I will not write, that I would call the class PriorityQueue<T>
(using a binary heap as data structure).
-
3\$\begingroup\$ I had to pause to think about your suggestion to grow by a constant amount, but I'd actually go further and say that growing more than necessary is simply a bad idea, because the
List
is already growing by a constant factor (for good reason), and this code should not pretend to do that job for it because it can only lead to wasted space. \$\endgroup\$VisualMelon– VisualMelon2018年10月10日 10:03:01 +00:00Commented Oct 10, 2018 at 10:03 -
5\$\begingroup\$ @VisualMelon: You certainly have a point. I think, the
Grow()
mechanism is a leftover from implementations using an array. Using a List, one can just add as needed. \$\endgroup\$user73941– user739412018年10月10日 10:09:46 +00:00Commented Oct 10, 2018 at 10:09 -
2\$\begingroup\$ You're exactly right Henrik,
Grow
only exists because I originally started with an array as the backing store. \$\endgroup\$Kittoes0124– Kittoes01242018年10月10日 10:56:42 +00:00Commented Oct 10, 2018 at 10:56
I don't like the Swap
method. The two ref
parameters suggest that the values passed are swapped. Instead, the values are taken as indexes in m_values
whose values are swapped. It also has an unexpected side-effect as it changes the index x
. Also, y
is passed as ref
, but it never changes.
Either call it SwapValuesAt
and take the input as indexes without ref
, or call it Swap
and swap the values passed with ref
.
private void Swap(ref T a, ref T b)
{
T temp = a;
a = b;
b = temp;
}
private void SwapValuesAt(int i, int j)
{
T temp = m_values[i];
m_values[i] = m_values[j];
m_values[j] = temp;
}
However, the first variant will produce the compiler error
A property or indexer may not be passed as an out or ref parameter
... if you try to call it with Swap(ref m_values[i], ref m_values[j])
, as an indexer does not yield a reference to the list position, but instead returns the value itself. It works with arrays, however, as []
is a true index into the array and not an indexer method.
Firstly, on a general point of style: running methods into each other with no separation is unusual and I find it distracting. If newlines inside methods to separate logical sections improve readability, surely you ought to separate methods too?
private readonly Func<T, T, bool> m_comparer;
It would be more idiomatic to use System.Collections.Generic.IComparer<T>
.
private int m_nextIndex; public int Capacity => m_values.Count; public int Count => m_nextIndex;
Why not Count => m_values.Count
? If you want to muck about with explicit capacities, use List<T>
instead of IList<T>
because it has a Capacity
property. But without an explanation justifying the need for capacity, I would ditch it entirely.
public BinaryHeap(IList<T> values, Func<T, T, bool> comparer) { m_comparer = comparer; m_values = values;
Yikes! I would expect any type of collection to take a copy of the values I pass to initialise it, not to wrap them and modify them.
Coming back to my earlier point about IList<T>
vs List<T>
, the main implementations of IList<T>
are List<T>
and T[]
. The latter doesn't support resizing, so it's not a suitable type for m_values
. I really think you should make m_values = new List<T>(values);
public bool TryPop(out T value) { if (0 < Count) { value = Peek(); m_values[0] = m_values[--m_nextIndex]; m_values[m_nextIndex] = default;
Good. Maybe add a comment to say why it's important not to keep the extra reference.
private void Heapify(int offset) { var count = Count; while (offset < count) { var left = ((offset << 1) + 1); var right = (left + 1); var parent = offset; if ((left < count) && m_comparer(m_values[left], m_values[parent])) { parent = left; } if ((right < count) && m_comparer(m_values[right], m_values[parent])) { parent = right; } if (offset == parent) { return; } Swap(ref offset, ref parent);
In Push
you minimised the assignments by only storing the moving value at its final destination. Why not do the same here?
-
\$\begingroup\$ Performance is a feature that shouldn't be overlooked, copying the values into a new list incurs unnecessary overhead that the caller would be completely unable to opt out of. Also, I'd absolutely love to minimize the amount of assignments as suggested but couldn't seem to figure out how to do so without breaking things; might you be able to demonstrate a working example? \$\endgroup\$Kittoes0124– Kittoes01242018年10月10日 11:29:49 +00:00Commented Oct 10, 2018 at 11:29
-
1\$\begingroup\$ Copying the values into a new list should be negligible compared to heapifying the whole list. As for refactoring
Heapify
: not without a thorough test suite, and writing one is going quite a bit further than I'm inclined to go for what's supposed to be a review. \$\endgroup\$Peter Taylor– Peter Taylor2018年10月10日 12:55:03 +00:00Commented Oct 10, 2018 at 12:55
The Swap
method can be much cooler with the tuple syntax to do the operation in only one line. At the same time make it an extension method and pass the two indexes with the new in
keyword to pass the values as a reference (value type and reference type both) which has a great potential to improve performance because no copying is involved.
public static void Swap<T>(this IList<T> source, in int x, in int y)
{
(source[x], source[y]) = (source[y], source[x]);
}
and use it with
m_values.Swap(in offset, in parent);
values
parameter bug
There is a bug in the constructor.
public BinaryHeap(IList<T> values, Func<T, T, bool> comparer) { ... _values = values; ... }
It assigns the values
to the prviate _values
field but an array also implements this interface. This means that if you try to create it with
var bh = new BinaryHeap<int>(new[] { 1, 2, 5, 8 }, (x, y) => x == y);
it'll crash because an array has a fixed size and Grow
will fail at _values.Add(default);
You should let the use pass an IEnumerable<T>
and call .ToList()
yourself to not only be sure it's modifiable but also to make sure nobody modifies it for you out side of the class. Currently I could .Clear()
the values
and it would crash again.
Oh, one more thing. Since BinaryHeap<T>
is a collection it should also implement the IEnumerable<T>
interface.
-
2\$\begingroup\$ +1 for demonstrating something I didn't know yet (
in
), but it does introduce an extra level of indirection which can make things slower. And on 64-bit systems pointers are larger than (32-bit) integers, so passing an integer directly should be faster than passing its address. \$\endgroup\$Pieter Witvoet– Pieter Witvoet2018年10月10日 14:35:33 +00:00Commented Oct 10, 2018 at 14:35 -
\$\begingroup\$ @PieterWitvoet mhmm, an interesting theory - anyone to prove it? ;-] \$\endgroup\$t3chb0t– t3chb0t2018年10月10日 15:13:06 +00:00Commented Oct 10, 2018 at 15:13
-
3\$\begingroup\$ I ran some tests before I commented, and using
in
(orref
) was only faster when using relatively large value-types (I tested with a struct containing 4 integers). Withint
,long
and a test class it was slower. \$\endgroup\$Pieter Witvoet– Pieter Witvoet2018年10月10日 15:32:03 +00:00Commented Oct 10, 2018 at 15:32 -
\$\begingroup\$
in
makes me sad. Hardly anyone understands it. I would advise against using it unless you have a good (demonstrable) reason (i.e. you are passing around largestruct
s and can show it provides a significant improvement). In theory, much of the benefit ofin
could be had through better optimisation on the part of the CLR (whichin
will actually get in the way of). As Pieter Witvoet says, it can easily degrade performance, and means the parameter is invariant. Reference types asin
have only semantic advantages (i.e.readonly
and returnable by-ref). But otherwise a good answer ;) \$\endgroup\$VisualMelon– VisualMelon2018年10月10日 16:11:49 +00:00Commented Oct 10, 2018 at 16:11 -
3\$\begingroup\$ @t3chb0t don't get me started on
in
as it is designed... there is no technical reason a ByVal can't bereadonly
;in
has those semantics because it has to, but means that someone else can modify the value for you if they want to (e.g. on a different thread or from a called method). Contemplatingin
has caused me a significant amount of misery, and as you correctly note, it would take a lot of documentation to explain in full. I'd better stop before I start hyperventilating.in
is bad for my health. \$\endgroup\$VisualMelon– VisualMelon2018年10月10日 17:01:12 +00:00Commented Oct 10, 2018 at 17:01