On the previous page, we took a brief overview of dictionary compression in the DEFLATE algorithm. After applying dictionary compression, the DEFLATE algorithm applies a form of Huffman encoding, also known as Huffman compression.
(追記) (追記ここまで)It is worth understanding Huffman encoding because one of the options that the Deflater offers is to apply Huffman encoding only.
Huffman encoding is one of the most famous types of compression system. Even if it is not used on its own, it is often used in conjunction with another compression scheme (as indeed in the case here).
Huffman encoding works on a stream of bits. It takes an "alphabet" of symbols (for example, the possible "symbols" could be the bytes 0-255, but it could be some arbitrary range of numbers/values), and re-codes each symbol in the alphabet as a sequence of bits, so that the sequence corresponding to each symbol reflects the frequency of that symbol. The most frequently occurring symbol will be represented by the shortest bit sequence, and with the length of each bit sequence being roughly inversely proportional to its frequency.
Possible Huffman encodings can be worked out by creating a "Huffman tree" as follows:
The idea of Huffman encoding is to assign each symbol a representation that is inversely proportional to its frequency. But in reality, this is not usually possible.
The problem is that the exact number of bits requred for a given symbol's length to be inversely proportional to frequency could actually be a fractional number. Huffman codes are only actually optimal when the probabilities of symbols are negative powers of 2 (1/2, 1/4 etc): in other words, where there's one symbol occuring around 50% of the time, then another around 25% of the time, etc.
However, if you do have data distributed in this way, you may be able to take advantage of the Deflater's HUFFMAN_ONLY mode. You may also be able to transform your data so that the distribution of bytes is closer to the negative power of 2 relationship just described. On the next page, I discuss a simple transformation that generally improves text compression with the Deflater by between 10 and 20%.
If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants. Follow @BitterCoffey
Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.