The heap sort algorithm
Introduction
The heapsort algorithm is a sorting algorithm which has time complexity O(n
log n) and does not need additional space. Merge sort has the same time
complexity but usually needs additional space. As opposed to merge sort, the
heap sort algorithm is not stable i.e. it might reverse the order of equal
elements.
The basic program elements of the heapsort algorithm not only work for
sorting. They can be applied to implement priority queues. A priority queue is
a collection of elements where the largest element is accessible very fast.
The basic definitions
Abstract array
The program elements of heapsort work on abstract arrays, i.e. mutable
structures which inherit conformantly from
deferred class ABSTRACT_ARRAY[G] feature content: ghost LIST[G] -- List representing the content of the array. deferred end count: NATURAL -- The number of elements in the array deferred ensure Result = content.count end [] (i:NATURAL): G -- The ith element of the array require i < count deferred ensure Result ~ content[i] end swap(i,j:NATURAL) -- Swap the elements at position `i' and `j'. require i<count; j<count deferred ensure content = old content.swapped(i,j) end end
The module
We put all code for heap sorting into a module call “heap_sorter”. The module
defines a class HEAP_SORTER with a formal generic G which has the constraint
that the type G must have an order structure (i.e. a total order).
-- module: heap_sorter immutable class HEAP_SORTER[G:ORDER] end ...
The only purpose of the class HEAP_SORTER is to define the formal generic G
and its constraint. With this technique it can be avoided that each function,
procedure and asssertion within the module needs to define its own formal
generic.
Heap structure
The content of the abstract array will be viewed as a binary tree. The root of
the tree is the first element. The element at index position “i” has a left
child at position “2*i+1” and a right child at position “2*i+2”.
The binary tree is depicted in the following drawing.
l[0] / \ / \ / \ l[1] l[2] / \ / / \ / l[3] l[4] l[5] .... ...........
Note that each level except the last one of the binary tree is full.
We define some index functions to compute the index of the left and right
child and the index of the parent.
feature -- Index functions left(i:NATURAL): NATURAL ensure Result = 2*i+1 end right(i:NATURAL): NATURAL ensure Result = 2*i+2 end is_left(i:NATURAL): BOOLEAN ensure Result = (i mod 2 = 1) end is_right(i:NATURAL): BOOLEAN ensure Result = (i>0 and i mod 2 = 0) end parent(i:NATURAL): NATURAL require i>0 ensure Result = (i-1)/2 end all(i:NATURAL) ensure i.is_left => i>0 i.is_right => i>0 i.left.is_left i.right.is_right i.left.parent = i - i.right.parent = i i.is_left => i.parent.left = i i.is_right => i.parent.right = i end end
A sequence has a heap structure if each parent has a higher value than its
both childs. In order to define this property formally we use a ghost
predicate “is_heap”.
feature is_heap(l:LIST[G]): ghost BOOLEAN -- Does the list `l' has a heap structure, i.e. each child -- is less or equal than its parent? ensure Result = all(i:NATURAL) require 0<i i<l.count ensure l[i]<=l[i.parent] end end end
Since the order relation is transitive not only the direct childs of a parent
have lower values but also all the children of the children and so on.
If a sequence has a heap structure then the element with the highest value is
always the first element of the sequence.
Build a heap structure
Initially our array is completely unsorted. Before sorting the array we
rearrange the elements such that the content has a heap structure.
Outline of the algorithm
We need a loop to achieve a heap structure. Therefore we need an invariant
which is easy to establish and to maintain and which together with the exit
condition guarantees the heap structure.
Let us look at the sequence
[5, 0, 1, 5, 3, 4]
which is represented by the binary tree (2:1 is value 1 at index 2)
.. 0:5 .. / \ / \ 1:0 2:1 / \ / / \ / 3:5 4:3 5:4
The leave nodes at index 3, 4 and 5 have already a heap structure because they
have no children. In order to describe this situation we define the predicate
“is_heap_from”.
feature is_heap_from(l:LIST[G], i:NATURAL): ghost BOOLEAN -- Are all elements of the list `l' from index `i' on valid -- parents of a heap structure? ensure Result = all(j:NATURAL) require 0 < j j < l.count i <= j.parent ensure l[j]<=l[j.parent] end end all(l:LIST[G], c:NATURAL) ensure (c/2).left >= c l.is_heap_from(l.count/2) l.is_heap_from(0) => l.is_heap l.is_heap_from(i) => l.is_heap_from(i+1) end end
The predicate “l.is_heap_from(i)” is a good canditate for a loop invariant since
it is valid for “i=l.count/2” and the condition “i=0” implies the goal that
the list has a heap structure. I.e. the procedure “make_heap” has the outline
make_heap(a:ABSTRACT_ARRAY[G]) local i: NATURAL do from i := a.count/2 invariant i <= a.count a.content.is_heap_from(i) a.content.is_permutation(old a.content) variant i until i = 0 loop i := i-1 -- decrease the variant a.sift_down(i,a.count) -- reestablish invariant (see below) end ensure a.content.is_heap a.content.is_permutation(old a.content) end
Reestablish the invariant: “sift_down”
If the invariant “l.is_heap_from(i)” is valid and we decrease the index “i” we
might temporarily violate the invariant because we add a root element to the
partial heap which is not yet a valid parent for its children because its
value might be less or equal than the left or the right child.
E.g. in the above tree
.. 0:5 .. / \ / \ 1:0 2:1 / \ / / \ / 3:5 4:3 5:4
we have “l.is_heap_from(3)” because the elements at index 3, 4 and 5 are valid
parents. However the condition “l.is_heap_from(2) is not valid, because the
value at position 2 is less than the value of its left child at position 5
(the node at position 2 does not have a right child in that case).
This defect can be corrected easily. Just identify the child which has a
greater value than the root and swap the two values. This pushes the defect
one level down the tree. I.e. we have to continue until the node is a valid
parent or until it has no children. This can be easily done with a loop.
In order to describe the invariant we have to find a predicate which describes
a partial heap with one defect position.
feature -- partial heap with one defect position is_nearly_heap_from(l:LIST[G], i,k:NATURAL) -- Are all elements of the list `l' from index `i' on except at -- position `k' valid parents of a heap structure? ensure Result = all(j:NATURAL) require 0 < j j < l.count i <= j.parent k /= j.parent ensure l[j] <= l[j.parent] end end is_valid_parent(p:NATURAL, l:LIST[G]: ghost BOOLEAN -- Is the element at position `p' within the list `l' a valid -- parent of a heap structure (i.e. left and right child have -- lower values)? ensure Result = ((p.left<l.count => l[p.left]<=l[p]) and (p.right<l.count => l[p.right]<=l[p])) end all(l:LIST[G], i,k:NATURAL) require l.is_nearly_heap_from(i,k) l.count<=k.left or k.is_valid_parent(l) ensure l.is_heap_from(i) end all(l:LIST[G], i:NATURAL) ensure l.is_heap_from(i+1) => l.is_nearly_heap_from(i,i) end end
With this definition and assertion we can formulate the procedure “sift_down”.
feature sift_down(a:ABSTRACT_ARRAY[G], i,cnt:NATURAL) -- Sift down a defect at position `i' of a partial heap of the -- array `a'. Only consider the elements below `cnt'. require cnt <= a.count i < cnt a.content.take(cnt).is_heap_from(i+1) local parent,child: NATURAL good_parent: BOOLEAN do from parent,good_parent := i,False invariant parent < cnt a.content.take(cnt).is_nearly_heap_from(i,parent) a.content.is_permutation(old a.content) good_parent => parent.is_valid_parent(a.content.take(cnt)) variant cnt - parent until parent.left>=cnt or good_parent loop child := parent.left if child+1<a.count and a[child]<a[child+1] -- find the child with the highest value then child := child+1 end if a[child] > a[parent] then a.swap(parent,child) -- Swap if necessary else good_parent := True end parent := child end ensure a.content.take(cnt).is_heap_from(i) a.content.is_permutation(old a.content) end end
Sort the heap
If the array has already a heap structure (i.e. all parents have larger values
than their children) it is easy to sort the array.
We maintain the condition that the first part of the array has a heap
structure and the second part of the array is sorted. We describe sortedness
by the following predicate.
feature is_sorted(l:LIST[G]): ghost BOOLEAN ensure Result = all(a,b:G) [a,b] in l.tail_zipped => a<=b end end
Then conditions
a.content.take(i).is_heap a.content.drop(i).is_sorted all(e,f:G) e in a.content.take(i) => f in a.content.drop(i) => e<=f
are easy to establish with “i=a.count”. The condition “i=0” guarantees that
the whole array is sorted.
If we decrement “i” we violate the second part of the invariant. The condition
can be reestablished by swapping the i-th element with the first (the first is
the largest in the heap and all elements in the heap are smaller than the
elements of the already sorted part).
But this violates the first condition of the invariant because we put an
element at the top of the tree which has usually a too small value to be the
top of the tree. With a call to the procedure “sift_down” it is possible to
reestablish this condition as well.
feature sort_heap(a:ABSTRACT_ARRAY[G]) require a.content.is_heap local i:NATURAL do from i := a.count invariant i_1: i <= a.count i_2: a.content.take(i).is_heap i_3: a.content.drop(i).is_sorted i_4: all(e,f:G) e in a.content.take(i) => f in a.content.drop(i) => e<=f i_5: a.content.is_permutation(old a.content) variant i until i = 0 loop i := i-1 -- decrease variant a.swap(0,i) -- reestablish i_3 a.sift_down(0,i) -- reestablish i_2 -- i_1, i_4 and i_5 are never violated end ensure a.content.is_sorted end end
The complete heap sort
feature sort(a:ABSTRACT_ARRAY[G]) do a.make_heap a.sort_heap ensure a.content.is_sorted a.content.is_permutation(old a.content) end end
Priority queue
A priority queue is a sequence of elements which is maintained to have a heap
structure. With this condition it is possible to access the largest element
very fast since it is always the first element.
An empty sequence satisfies the condition of being a heap. Since arrayed
containers allow to insert elements rapidly at the end an insertion violates
the heap condition in a different manner as in the above algorithm
“make_heap”. We do not introduce a defect at the top of the tree, we introduce
a defect at the bottom of the tree.
A single defect in a heap
In order to reestablish the heap condition we need to sift the element up
in the tree until it reaches its appropriate position. The following predicate
describes a singe defect in a heap structure.
feature is_heap_except_child(l:LIST[G], i:NATURAL) ghost BOOLEAN -- Does `l' have a heap structure except for the child at `i'? ensure Result = all(j:NATURAL) require 0 < j j < l.count j/= i ensure l[j]<=l[j.parent] end end all(l:LIST[G], e:G) ensure l.is_heap_except_child(0) => l.is_heap l.is_heap => (l+e).is_heap_except_child(l.count) end all(l:LIST[G], i:NATURAL) require 0 < i i < l.count l.is_heap_except_child(i) l[i] <= l[i.parent] ensure l.is_heap end end
The procedure “sift_up”
The procedure “sift_up” accepts an array which has a heap structure when
neglecting the last element. The algorithm consists of a loop which pushes the
defect one level up until there is no longer any defect. I.e. the algorithm
reestablishes the heap condition for the full array including the last element.
feature sift_up(a:ABSTRACT_ARRAY[G]) require 0 < a.count a.content.without_last.is_heap local i:NATURAL do from i := a.count-1 invariant i < a.count a.content.is_heap_except_child(i) a.content.is_permutation(old a.content) variant i until i=0 or a[i]<=a[i.parent] loop a.swap(i.parent,i) i := i-1 end ensure a.content.is_heap a.content.is_permutation(old a.content) end end
Inserting into and deleting from a priority queue
With the procedures “sift_up” and “sift_down” it is easy to maintain a
priority queue with a arrayed structure.
– Insert: Insert a new element at the end of a valid heap and call “sift_up”
to put it into its proper position.
– Remove the top element: Swap the first and the last element, remove the last
element and sift the new top element down by a call to “sift_down” to put it
in its proper position.
Formal verification, Software Verification
This entry was posted on July 23, 2012, 1:46 pm and is filed under Uncategorized. You can follow any responses to this entry through RSS 2.0. You can leave a response, or trackback from your own site.
- Leave a comment