Runtime analysis with multiple parameters: case study with heaps in Huffman coding

Question 1

While studying a book on algorithms, I came across a question that asked about essentially $d$-ary Huffman coding, where the codeword alphabet has $d$ symbols (the usual case has $d=2$, with symbols 0ドル$ and 1ドル$, ie, binary).

Adapting the usual Huffman coding algorithm for the $d$-ary case is simple enough, and things work as expected. However, it had me thinking on whether or not choosing a different data structure could improve the running time.

The question rises very naturally, since traditional binary Huffman uses a binary heap. Could it perhaps be that using a $d$-ary heap for $d$-ary Huffman would result in better runtime complexity?

I've tried my hand at it and calculations get ugly very fast. The worst part, however, is the fact that I'm having a very hard time actually proving anything in the direction of optimality when big-O is involved. Since big-O can discard so much of an expression, I'm finding it difficult to handle it when more than one parameter is involved.

In this particular case, we have three parameters:

$n$, the size of input
$d$, the size of the code-word alphabet
$k$, the type of heap used ($k$-ary heap)

The calculations seem to suggest that $k=3$ and $k=4$ should, theoretically, always outperform the binary heap, regardless of the value of $n$ and $d$. Indeed, a posteriori this seems to be something intrinsic to the heap structure.

However, a choice between these values, or a demonstration that higher values of $k$ are suboptimal, still seems elusive; and in this case the context (algorithm, values of $n$ and $d$) should play a role.

Question 2

You use a priority queue. Priority queues are usually implemented using a heap. That can be a binary or some other heap, but that decision has nothing whatsoever to do with your Huffman code implementation.

And there is no change in Big-O behaviour, just in the constant factors, which are implementation dependent.

But in the end, the time to build the code is negligible compared to the time needed to compress or decompress a large file.

Question 3

Different Huffman implementations change how the algorithm loop behaves (a $d$-ary Huffman will have $d$ extract-min operations followed by an insertion, per loop iteration). Moreover, the total number of iterations in the loop also changes (number of nodes of a full $d$-ary tree on $n$ leaves depends on $d$). I assume these things matter when analyzing which $d$ results in less total operations performed.

gnasher729 32.6k36 silver badges58 bronze badges · Answer 1 · 2021-12-07 07:09:32Z

You use a priority queue. Priority queues are usually implemented using a heap. That can be a binary or some other heap, but that decision has nothing whatsoever to do with your Huffman code implementation.

And there is no change in Big-O behaviour, just in the constant factors, which are implementation dependent.

But in the end, the time to build the code is negligible compared to the time needed to compress or decompress a large file.

Different Huffman implementations change how the algorithm loop behaves (a $d$-ary Huffman will have $d$ extract-min operations followed by an insertion, per loop iteration). Moreover, the total number of iterations in the loop also changes (number of nodes of a full $d$-ary tree on $n$ leaves depends on $d$). I assume these things matter when analyzing which $d$ results in less total operations performed.

Stack Exchange Network

Runtime analysis with multiple parameters: case study with heaps in Huffman coding

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Runtime analysis with multiple parameters: case study with heaps in Huffman coding

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions