3

I'm reading Algorithms by Jeff Erickson, and I have trouble in solving the exercise in Greedy Algorithms:

(b) Suppose the total length N of the unencoded message is bounded by a polynomial in the alphabet size n. Prove that any Huffman tree for the frequencies f[1...n] has depth O(log n).

I don't know how to prove this, because I think the depth could reach n-1, like this: f[1...n] = {1, 1, 1, 3, 4, 7, 11, 18, 29, ...} (the Lucas number).

Maybe it's because of "N is bounded by a polynomial in the alphabet size n", but what does it mean? (for example, N = k n, and k is a constant?) If it's like this, I would say it's because when I merge two vertices u1, u2 (u1 <= u2) in the Huffman Tree, the new vertex v = u1 + u2, which is at least once bigger than u1, and I know that the root of the tree is n, so I will only merge vertices for no more than O(log n) times, so the depth is about O(log n). But I don't know if I'm right.

Could you please help me? Thanks!

asked Jun 26, 2024 at 3:29
2
  • This is not an assignment helper platform. Also show your different attempts and where you got stuck. Commented Jun 26, 2024 at 5:34
  • 1
    Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Commented Jun 26, 2024 at 5:35

2 Answers 2

1

Yes, you are correct that without any constraints the depth could indeed reach n-1 in the extremely unbalanced case where you effectively have a linked list.

However, the added constraint of "... the total length N of the unencoded message is bounded by a polynomial in the alphabet size n." implies that N=O(n^k). Therefore, the average frequency of each character is N/n=O(n^(k−1)). This implies that no character’s frequency is disproportionately large or small compared to the others. The frequencies are distributed more evenly, preventing extreme imbalances during the tree construction.

If we consider a binary tree, the height is logarithmic in the number of leaves if the tree is balanced. For a perfectly balanced tree, the height is O(log⁡ n).

While the Huffman tree with this constraint is not always perfectly balanced, the balanced nature of the merges due to the tend of evenly distributed frequencies ensures that the height is close to O(log⁡ n).

answered Jun 26, 2024 at 10:44
Sign up to request clarification or add additional context in comments.

Comments

1

Your Lucas-like sequence is the answer to part (a) of that question, which is a clue that it might be useful in answering part (b).

The total length of a message with Lucas frequencies is two Lucas numbers up from the last frequency, minus one. So to get a depth d Huffman tree, the minimum message length is Ld+1 - 1. (Note that L0=2, which is broken up into the initial {1,1}. That is why I said "Lucas-like" earlier. Then L1=1, L2=3.) For a message length N, we have that N ≥ Ld+1 - 1 to be able to get a depth d tree.

The Lucas sequence (as well as the Fibonacci sequence) grows exponentially as φn, where φ is the Golden ratio. In particular Ln > φn-1.

To get depth d, we have N > φd. So log(N) > d log(φ). p(n) ≥ N, where p is a polynomial, we'll say of degree k. So log(p(n)) > d log(φ). Using the highest power of p to get the order, we have dO(k log(n) / log(φ)), or simply dO(log(n)), since k and φ are constants independent of n. We know that d ≥ ⌈log(n)⌉, so d = O(log(n)).

answered Jun 26, 2024 at 15:25

1 Comment

Thanks! Now I understand the relationship between the frequencies and the depth of the Huffman Tree.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.