Binary Heap (list-based)

Question 1

This is my first attempt at implementing a heap data structure and I've decided to cover the methods Peek, Pop, Push, and Replace. Think appear to work correctly but I admittedly had a lot of trouble figuring out the "ideal" way to implement each function as the reference implementations I found were all over the place.

Any feedback, especially if it addresses concerns that aren't related to formatting/naming, is most appreciated.

public class BinaryHeap<T>
{
 private readonly Func<T, T, bool> m_comparer;
 private readonly IList<T> m_values;
 private int m_nextIndex;
 public int Capacity => m_values.Count;
 public int Count => m_nextIndex;
 public BinaryHeap(IList<T> values, Func<T, T, bool> comparer) {
 m_comparer = comparer;
 m_values = values;
 Grow(checked((int)(Operations.NextPowerOf2(Capacity) - 1)));
 Rebuild();
 }
 [MethodImpl(MethodImplOptions.AggressiveInlining)]
 public T Peek() {
 return m_values[0];
 }
 public T Pop() {
 if (TryPop(out T value)) {
 return value;
 }
 else {
 throw new InvalidOperationException(message: "heap is empty");
 }
 }
 public int Push(T value) {
 var offset = Count;
 if (!(offset < Capacity)) {
 Grow(checked((int)(Operations.NextPowerOf2(Capacity + 1) - 1)));
 }
 while (0 < offset) {
 var parent = ((offset - 1) >> 1);
 if (m_comparer(m_values[parent], value)) {
 break;
 }
 m_values[offset] = m_values[parent];
 offset = parent;
 }
 m_values[offset] = value;
 return m_nextIndex++;
 }
 public T Replace(T newValue) {
 if (TryReplace(newValue, out T oldValue)) {
 return oldValue;
 }
 else {
 throw new InvalidOperationException(message: "heap is empty");
 }
 }
 public bool TryReplace(T newValue, out T oldValue) {
 if (0 < Count) {
 oldValue = Peek();
 m_values[0] = newValue;
 Heapify(0);
 return true;
 }
 else {
 oldValue = default;
 return false;
 }
 }
 public bool TryPop(out T value) {
 if (0 < Count) {
 value = Peek();
 m_values[0] = m_values[--m_nextIndex];
 m_values[m_nextIndex] = default;
 Heapify(0);
 return true;
 }
 else {
 value = default;
 return false;
 }
 }
 private void Grow(int maxCapacity) {
 var currentCapacity = Capacity;
 for (var i = currentCapacity; (i < maxCapacity); i++) {
 m_values.Add(default);
 }
 m_nextIndex = currentCapacity;
 }
 private void Heapify(int offset) {
 var count = Count;
 while (offset < count) {
 var left = ((offset << 1) + 1);
 var right = (left + 1);
 var parent = offset;
 if ((left < count) && m_comparer(m_values[left], m_values[parent])) {
 parent = left;
 }
 if ((right < count) && m_comparer(m_values[right], m_values[parent])) {
 parent = right;
 }
 if (offset == parent) {
 return;
 }
 Swap(ref offset, ref parent);
 }
 }
 private void Rebuild() {
 for (var i = ((Count - 1) >> 1); (-1 < i); i--) {
 Heapify(i);
 }
 }
 [MethodImpl(MethodImplOptions.AggressiveInlining)]
 private void Swap(ref int x, ref int y) {
 var temp = m_values[x];
 m_values[x] = m_values[y];
 m_values[y] = temp;
 x = y;
 }
}

As requested, here are the missing method(s):

public static class Operations
{
 public static long NextPowerOf2(long value) {
 return ((long)NextPowerOf2((ulong)value));
 }
 public static ulong NextPowerOf2(ulong value) {
 return (FoldBits(value) + 1);
 }
 [MethodImpl(MethodImplOptions.AggressiveInlining)]
 private static ulong FoldBits(ulong value) {
 value |= (value >> 1);
 value |= (value >> 2);
 value |= (value >> 4);
 value |= (value >> 8);
 value |= (value >> 16);
 return (value | (value >> 32));
 }
}

Question 2

All in all it seems to work all right.

Personally I don't like constructs like:

if (0 < Count) {...}

but:

if (Count > 0) {...}

And this:

if (!(offset < Capacity))

is less readable than:

if (offset >= Capacity)

If I interpret Operations.NextPowerOf2 in the right way, I think it makes the heap capacity grow unnecessarily for larger number of existing data. Istead I think, I would have a constant value to expand the heap with.

In the below the outer parentheses are redundant:

 var left = ((offset << 1) + 1);
 var right = (left + 1);
for (var i = ((Count - 1) >> 1); (-1 < i); i--)

private void Grow(int maxCapacity)
{
 var currentCapacity = Capacity;
 for (var i = currentCapacity; i < maxCapacity; i++)
 {
 m_values.Add(default);
 }
 m_nextIndex = currentCapacity;
}

In Grow(), I don't think that m_nextIndex should change (and accidentally it won't here) . Grow() should only change the capacity - not the state of the heap.

Because you are not interested in feedback about naming, I will not write, that I would call the class PriorityQueue<T> (using a binary heap as data structure).

Question 3

I had to pause to think about your suggestion to grow by a constant amount, but I'd actually go further and say that growing more than necessary is simply a bad idea, because the List is already growing by a constant factor (for good reason), and this code should not pretend to do that job for it because it can only lead to wasted space.

Question 4

@VisualMelon: You certainly have a point. I think, the Grow() mechanism is a leftover from implementations using an array. Using a List, one can just add as needed.

Question 5

You're exactly right Henrik, Grow only exists because I originally started with an array as the backing store.

Question 6

I don't like the Swap method. The two ref parameters suggest that the values passed are swapped. Instead, the values are taken as indexes in m_values whose values are swapped. It also has an unexpected side-effect as it changes the index x. Also, y is passed as ref, but it never changes.

Either call it SwapValuesAt and take the input as indexes without ref, or call it Swap and swap the values passed with ref.

private void Swap(ref T a, ref T b)
{
 T temp = a;
 a = b;
 b = temp;
}
private void SwapValuesAt(int i, int j)
{
 T temp = m_values[i];
 m_values[i] = m_values[j];
 m_values[j] = temp;
}

However, the first variant will produce the compiler error

A property or indexer may not be passed as an out or ref parameter

... if you try to call it with Swap(ref m_values[i], ref m_values[j]), as an indexer does not yield a reference to the list position, but instead returns the value itself. It works with arrays, however, as [] is a true index into the array and not an indexer method.

Question 7

Firstly, on a general point of style: running methods into each other with no separation is unusual and I find it distracting. If newlines inside methods to separate logical sections improve readability, surely you ought to separate methods too?

 private readonly Func<T, T, bool> m_comparer;

It would be more idiomatic to use System.Collections.Generic.IComparer<T>.

 private int m_nextIndex;
 public int Capacity => m_values.Count;
 public int Count => m_nextIndex;

Why not Count => m_values.Count? If you want to muck about with explicit capacities, use List<T> instead of IList<T> because it has a Capacity property. But without an explanation justifying the need for capacity, I would ditch it entirely.

 public BinaryHeap(IList<T> values, Func<T, T, bool> comparer) {
 m_comparer = comparer;
 m_values = values;

Yikes! I would expect any type of collection to take a copy of the values I pass to initialise it, not to wrap them and modify them.

Coming back to my earlier point about IList<T> vs List<T>, the main implementations of IList<T> are List<T> and T[]. The latter doesn't support resizing, so it's not a suitable type for m_values. I really think you should make m_values = new List<T>(values);

 public bool TryPop(out T value) {
 if (0 < Count) {
 value = Peek();
 m_values[0] = m_values[--m_nextIndex];
 m_values[m_nextIndex] = default;

Good. Maybe add a comment to say why it's important not to keep the extra reference.

 private void Heapify(int offset) {
 var count = Count;
 while (offset < count) {
 var left = ((offset << 1) + 1);
 var right = (left + 1);
 var parent = offset;
 if ((left < count) && m_comparer(m_values[left], m_values[parent])) {
 parent = left;
 }
 if ((right < count) && m_comparer(m_values[right], m_values[parent])) {
 parent = right;
 }
 if (offset == parent) {
 return;
 }
 Swap(ref offset, ref parent);

In Push you minimised the assignments by only storing the moving value at its final destination. Why not do the same here?

Question 8

Performance is a feature that shouldn't be overlooked, copying the values into a new list incurs unnecessary overhead that the caller would be completely unable to opt out of. Also, I'd absolutely love to minimize the amount of assignments as suggested but couldn't seem to figure out how to do so without breaking things; might you be able to demonstrate a working example?

Question 9

Copying the values into a new list should be negligible compared to heapifying the whole list. As for refactoring Heapify: not without a thorough test suite, and writing one is going quite a bit further than I'm inclined to go for what's supposed to be a review.

Question 10

The Swap method can be much cooler with the tuple syntax to do the operation in only one line. At the same time make it an extension method and pass the two indexes with the new in keyword to pass the values as a reference (value type and reference type both) which has a great potential to improve performance because no copying is involved.

public static void Swap<T>(this IList<T> source, in int x, in int y)
{
 (source[x], source[y]) = (source[y], source[x]);
}

and use it with

m_values.Swap(in offset, in parent);

`values` parameter bug

There is a bug in the constructor.

public BinaryHeap(IList<T> values, Func<T, T, bool> comparer)
{
 ...
 _values = values;
 ...
}

It assigns the values to the prviate _values field but an array also implements this interface. This means that if you try to create it with

var bh = new BinaryHeap<int>(new[] { 1, 2, 5, 8 }, (x, y) => x == y);

it'll crash because an array has a fixed size and Grow will fail at _values.Add(default);

You should let the use pass an IEnumerable<T> and call .ToList() yourself to not only be sure it's modifiable but also to make sure nobody modifies it for you out side of the class. Currently I could .Clear() the values and it would crash again.

Oh, one more thing. Since BinaryHeap<T> is a collection it should also implement the IEnumerable<T> interface.

Question 11

+1 for demonstrating something I didn't know yet (in), but it does introduce an extra level of indirection which can make things slower. And on 64-bit systems pointers are larger than (32-bit) integers, so passing an integer directly should be faster than passing its address.

Question 12

@PieterWitvoet mhmm, an interesting theory - anyone to prove it? ;-]

Question 13

I ran some tests before I commented, and using in (or ref) was only faster when using relatively large value-types (I tested with a struct containing 4 integers). With int, long and a test class it was slower.

Question 14

in makes me sad. Hardly anyone understands it. I would advise against using it unless you have a good (demonstrable) reason (i.e. you are passing around large structs and can show it provides a significant improvement). In theory, much of the benefit of in could be had through better optimisation on the part of the CLR (which in will actually get in the way of). As Pieter Witvoet says, it can easily degrade performance, and means the parameter is invariant. Reference types as in have only semantic advantages (i.e. readonly and returnable by-ref). But otherwise a good answer ;)

Question 15

@t3chb0t don't get me started on in as it is designed... there is no technical reason a ByVal can't be readonly; in has those semantics because it has to, but means that someone else can modify the value for you if they want to (e.g. on a different thread or from a called method). Contemplating in has caused me a significant amount of misery, and as you correctly note, it would take a lot of documentation to explain in full. I'd better stop before I start hyperventilating. in is bad for my health.

user73941user73941 · Answer 1 · 2018-10-10 06:49:45Z

All in all it seems to work all right.

Personally I don't like constructs like:

if (0 < Count) {...}

but:

if (Count > 0) {...}

And this:

if (!(offset < Capacity))

is less readable than:

if (offset >= Capacity)

If I interpret Operations.NextPowerOf2 in the right way, I think it makes the heap capacity grow unnecessarily for larger number of existing data. Istead I think, I would have a constant value to expand the heap with.

In the below the outer parentheses are redundant:

 var left = ((offset << 1) + 1);
 var right = (left + 1);
for (var i = ((Count - 1) >> 1); (-1 < i); i--)

private void Grow(int maxCapacity)
{
 var currentCapacity = Capacity;
 for (var i = currentCapacity; i < maxCapacity; i++)
 {
 m_values.Add(default);
 }
 m_nextIndex = currentCapacity;
}

In Grow(), I don't think that m_nextIndex should change (and accidentally it won't here) . Grow() should only change the capacity - not the state of the heap.

Because you are not interested in feedback about naming, I will not write, that I would call the class PriorityQueue<T> (using a binary heap as data structure).

I had to pause to think about your suggestion to grow by a constant amount, but I'd actually go further and say that growing more than necessary is simply a bad idea, because the List is already growing by a constant factor (for good reason), and this code should not pretend to do that job for it because it can only lead to wasted space.
@VisualMelon: You certainly have a point. I think, the Grow() mechanism is a leftover from implementations using an array. Using a List, one can just add as needed.
You're exactly right Henrik, Grow only exists because I originally started with an array as the backing store.

score 8 · Answer 2 · 2018-10-09 17:47:32Z

I don't like the Swap method. The two ref parameters suggest that the values passed are swapped. Instead, the values are taken as indexes in m_values whose values are swapped. It also has an unexpected side-effect as it changes the index x. Also, y is passed as ref, but it never changes.

Either call it SwapValuesAt and take the input as indexes without ref, or call it Swap and swap the values passed with ref.

private void Swap(ref T a, ref T b)
{
 T temp = a;
 a = b;
 b = temp;
}
private void SwapValuesAt(int i, int j)
{
 T temp = m_values[i];
 m_values[i] = m_values[j];
 m_values[j] = temp;
}

However, the first variant will produce the compiler error

A property or indexer may not be passed as an out or ref parameter

... if you try to call it with Swap(ref m_values[i], ref m_values[j]), as an indexer does not yield a reference to the list position, but instead returns the value itself. It works with arrays, however, as [] is a true index into the array and not an indexer method.

Peter Taylor Peter Taylor 24.5k1 gold badge49 silver badges94 bronze badges · Answer 3 · 2018-10-10 09:13:44Z

Firstly, on a general point of style: running methods into each other with no separation is unusual and I find it distracting. If newlines inside methods to separate logical sections improve readability, surely you ought to separate methods too?

 private readonly Func<T, T, bool> m_comparer;

It would be more idiomatic to use System.Collections.Generic.IComparer<T>.

 private int m_nextIndex;
 public int Capacity => m_values.Count;
 public int Count => m_nextIndex;

Why not Count => m_values.Count? If you want to muck about with explicit capacities, use List<T> instead of IList<T> because it has a Capacity property. But without an explanation justifying the need for capacity, I would ditch it entirely.

 public BinaryHeap(IList<T> values, Func<T, T, bool> comparer) {
 m_comparer = comparer;
 m_values = values;

Yikes! I would expect any type of collection to take a copy of the values I pass to initialise it, not to wrap them and modify them.

Coming back to my earlier point about IList<T> vs List<T>, the main implementations of IList<T> are List<T> and T[]. The latter doesn't support resizing, so it's not a suitable type for m_values. I really think you should make m_values = new List<T>(values);

 public bool TryPop(out T value) {
 if (0 < Count) {
 value = Peek();
 m_values[0] = m_values[--m_nextIndex];
 m_values[m_nextIndex] = default;

Good. Maybe add a comment to say why it's important not to keep the extra reference.

 private void Heapify(int offset) {
 var count = Count;
 while (offset < count) {
 var left = ((offset << 1) + 1);
 var right = (left + 1);
 var parent = offset;
 if ((left < count) && m_comparer(m_values[left], m_values[parent])) {
 parent = left;
 }
 if ((right < count) && m_comparer(m_values[right], m_values[parent])) {
 parent = right;
 }
 if (offset == parent) {
 return;
 }
 Swap(ref offset, ref parent);

In Push you minimised the assignments by only storing the moving value at its final destination. Why not do the same here?

Performance is a feature that shouldn't be overlooked, copying the values into a new list incurs unnecessary overhead that the caller would be completely unable to opt out of. Also, I'd absolutely love to minimize the amount of assignments as suggested but couldn't seem to figure out how to do so without breaking things; might you be able to demonstrate a working example?
Copying the values into a new list should be negligible compared to heapifying the whole list. As for refactoring Heapify: not without a thorough test suite, and writing one is going quite a bit further than I'm inclined to go for what's supposed to be a review.

t3chb0t t3chb0t 44.6k9 gold badges84 silver badges190 bronze badges · Answer 4 · 2018-10-10 13:55:26Z

The Swap method can be much cooler with the tuple syntax to do the operation in only one line. At the same time make it an extension method and pass the two indexes with the new in keyword to pass the values as a reference (value type and reference type both) which has a great potential to improve performance because no copying is involved.

public static void Swap<T>(this IList<T> source, in int x, in int y)
{
 (source[x], source[y]) = (source[y], source[x]);
}

and use it with

m_values.Swap(in offset, in parent);

`values` parameter bug

There is a bug in the constructor.

public BinaryHeap(IList<T> values, Func<T, T, bool> comparer)
{
 ...
 _values = values;
 ...
}

It assigns the values to the prviate _values field but an array also implements this interface. This means that if you try to create it with

var bh = new BinaryHeap<int>(new[] { 1, 2, 5, 8 }, (x, y) => x == y);

it'll crash because an array has a fixed size and Grow will fail at _values.Add(default);

You should let the use pass an IEnumerable<T> and call .ToList() yourself to not only be sure it's modifiable but also to make sure nobody modifies it for you out side of the class. Currently I could .Clear() the values and it would crash again.

Oh, one more thing. Since BinaryHeap<T> is a collection it should also implement the IEnumerable<T> interface.

+1 for demonstrating something I didn't know yet (in), but it does introduce an extra level of indirection which can make things slower. And on 64-bit systems pointers are larger than (32-bit) integers, so passing an integer directly should be faster than passing its address.
@PieterWitvoet mhmm, an interesting theory - anyone to prove it? ;-]
I ran some tests before I commented, and using in (or ref) was only faster when using relatively large value-types (I tested with a struct containing 4 integers). With int, long and a test class it was slower.
in makes me sad. Hardly anyone understands it. I would advise against using it unless you have a good (demonstrable) reason (i.e. you are passing around large structs and can show it provides a significant improvement). In theory, much of the benefit of in could be had through better optimisation on the part of the CLR (which in will actually get in the way of). As Pieter Witvoet says, it can easily degrade performance, and means the parameter is invariant. Reference types as in have only semantic advantages (i.e. readonly and returnable by-ref). But otherwise a good answer ;)
@t3chb0t don't get me started on in as it is designed... there is no technical reason a ByVal can't be readonly; in has those semantics because it has to, but means that someone else can modify the value for you if they want to (e.g. on a different thread or from a called method). Contemplating in has caused me a significant amount of misery, and as you correctly note, it would take a lot of documentation to explain in full. I'd better stop before I start hyperventilating. in is bad for my health.

Stack Exchange Network

Binary Heap (list-based)

4 Answers 4

`values` parameter bug

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Binary Heap (list-based)

4 Answers 4

values parameter bug

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`values` parameter bug