I'm reading Algorithms by Jeff Erickson, and I have trouble in solving the exercise in Greedy Algorithms:
(b) Suppose the total length N of the unencoded message is bounded by a polynomial in the alphabet size n. Prove that any Huffman tree for the frequencies f[1...n] has depth O(log n).
I don't know how to prove this, because I think the depth could reach n-1, like this: f[1...n] = {1, 1, 1, 3, 4, 7, 11, 18, 29, ...} (the Lucas number).
Maybe it's because of "N is bounded by a polynomial in the alphabet size n", but what does it mean? (for example, N = k n, and k is a constant?) If it's like this, I would say it's because when I merge two vertices u1, u2 (u1 <= u2) in the Huffman Tree, the new vertex v = u1 + u2, which is at least once bigger than u1, and I know that the root of the tree is n, so I will only merge vertices for no more than O(log n) times, so the depth is about O(log n). But I don't know if I'm right.
Could you please help me? Thanks!
-
This is not an assignment helper platform. Also show your different attempts and where you got stuck.darth vader– darth vader2024年06月26日 05:34:34 +00:00Commented Jun 26, 2024 at 5:34
-
1Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer.Community– Community Bot2024年06月26日 05:35:17 +00:00Commented Jun 26, 2024 at 5:35
2 Answers 2
Yes, you are correct that without any constraints the depth could indeed reach n-1 in the extremely unbalanced case where you effectively have a linked list.
However, the added constraint of "... the total length N of the unencoded message is bounded by a polynomial in the alphabet size n." implies that N=O(n^k). Therefore, the average frequency of each character is N/n=O(n^(k−1)). This implies that no character’s frequency is disproportionately large or small compared to the others. The frequencies are distributed more evenly, preventing extreme imbalances during the tree construction.
If we consider a binary tree, the height is logarithmic in the number of leaves if the tree is balanced. For a perfectly balanced tree, the height is O(log n).
While the Huffman tree with this constraint is not always perfectly balanced, the balanced nature of the merges due to the tend of evenly distributed frequencies ensures that the height is close to O(log n).
Comments
Your Lucas-like sequence is the answer to part (a) of that question, which is a clue that it might be useful in answering part (b).
The total length of a message with Lucas frequencies is two Lucas numbers up from the last frequency, minus one. So to get a depth d Huffman tree, the minimum message length is Ld+1 - 1. (Note that L0=2, which is broken up into the initial {1,1}. That is why I said "Lucas-like" earlier. Then L1=1, L2=3.) For a message length N, we have that N ≥ Ld+1 - 1 to be able to get a depth d tree.
The Lucas sequence (as well as the Fibonacci sequence) grows exponentially as φn, where φ is the Golden ratio. In particular Ln > φn-1.
To get depth d, we have N > φd. So log(N) > d log(φ). p(n) ≥ N, where p is a polynomial, we'll say of degree k. So log(p(n)) > d log(φ). Using the highest power of p to get the order, we have d ≤ O(k log(n) / log(φ)), or simply d ≤ O(log(n)), since k and φ are constants independent of n. We know that d ≥ ⌈log(n)⌉, so d = O(log(n)).
1 Comment
Explore related questions
See similar questions with these tags.