While studying a book on algorithms, I came across a question that asked about essentially $d$-ary Huffman coding, where the codeword alphabet has $d$ symbols (the usual case has $d=2$, with symbols 0ドル$ and 1ドル$, ie, binary).
Adapting the usual Huffman coding algorithm for the $d$-ary case is simple enough, and things work as expected. However, it had me thinking on whether or not choosing a different data structure could improve the running time.
The question rises very naturally, since traditional binary Huffman uses a binary heap. Could it perhaps be that using a $d$-ary heap for $d$-ary Huffman would result in better runtime complexity?
I've tried my hand at it and calculations get ugly very fast. The worst part, however, is the fact that I'm having a very hard time actually proving anything in the direction of optimality when big-O is involved. Since big-O can discard so much of an expression, I'm finding it difficult to handle it when more than one parameter is involved.
In this particular case, we have three parameters:
- $n$, the size of input
- $d$, the size of the code-word alphabet
- $k$, the type of heap used ($k$-ary heap)
The calculations seem to suggest that $k=3$ and $k=4$ should, theoretically, always outperform the binary heap, regardless of the value of $n$ and $d$. Indeed, a posteriori this seems to be something intrinsic to the heap structure.
However, a choice between these values, or a demonstration that higher values of $k$ are suboptimal, still seems elusive; and in this case the context (algorithm, values of $n$ and $d$) should play a role.
1 Answer 1
You use a priority queue. Priority queues are usually implemented using a heap. That can be a binary or some other heap, but that decision has nothing whatsoever to do with your Huffman code implementation.
And there is no change in Big-O behaviour, just in the constant factors, which are implementation dependent.
But in the end, the time to build the code is negligible compared to the time needed to compress or decompress a large file.
-
$\begingroup$ Different Huffman implementations change how the algorithm loop behaves (a $d$-ary Huffman will have $d$ extract-min operations followed by an insertion, per loop iteration). Moreover, the total number of iterations in the loop also changes (number of nodes of a full $d$-ary tree on $n$ leaves depends on $d$). I assume these things matter when analyzing which $d$ results in less total operations performed. $\endgroup$Fimpellizzeri– Fimpellizzeri2021年12月07日 16:52:25 +00:00Commented Dec 7, 2021 at 16:52
Explore related questions
See similar questions with these tags.