next up previous
Next: Compressing Sets of Strings Up: Compressing Java Class Files Previous: Encoding Integers

Subsections

Compressing Bytecodes

Table 4: Compression for bytecode components

Benchmark programs

Compression for javac mpegaudio

Bytestream 48% 43%

Opcodes 36% 17%

using Stack State 35% 15%

using Custom opcodes 34% 13%

Branch offsets 41% 52%

Method references 35% 28%

Java bytecode sequences are a mixture of opcodes, integer constants, register numbers, constant pool references and branch offsets. As has been suggested previously [EEF⁺97], we might be able to achieve better compression if we separate that information into separate streams and compressed them independently.

Of course, note the ``we might''. It isn't guaranteed. For example, local 0 is initially (and generally throughout a method) used to store this (for non-static methods). There are some instruction patterns that depend on the registers and other values in the bytecode sequence. For example, an aload instruction is much more commonly followed by a getfield instruction when the aload instruction loads local 0. As it turns out, we would pick this up even though bytecodes are separated, because a special opcode is used for loading a reference from local 0 (aload_0). When we separate out the operands from the opcodes, we don't separate out the implicit operands in opcodes such as iconst_2 and aload_0.

Table 4 shows sample compression factors for bytecodes, and for various components of bytecodes. As you can see, we get substantially better compression factors for a sequence of opcodes than for a sequence of bytecodes. In some unusual cases, such as mpegaudio, we get absolutely incredible compression ratios. The other sequences don't always compress as well, but the overall effect is a substantial win.

Approximate Stack State

I also performed a calculation of the current stack state (a computation of the number and types of values on the stack before executing each instruction). This stack information was used to collapse opcodes. For example, if we know the type of the element on the top of the stack, we can collapse all four addition opcodes into the iadd instruction, and regenerate the correct opcode upon decompression. No backwards branches were considered, and I only remembered the stack state over one forward branch at any one time (because the decompressor has to duplicate this computation, it would be impossible to consider backward branches). Because of these limitations, the calculation was an approximation: sometimes, the system would not know what state the stack was in. The improvements realized by this optimization are modest, as seen in Table 4, but not expensive to computing while compressing or decompressing. Computing the stack information is also useful in compression references (§5). I have incorporated this optimization into my baseline results.

Using Custom Opcodes

I tried a custom opcode approach to compressing JVM opcodes [EEF⁺97,FP95]. The program looked for pairs of adjacent opcodes, that, if replaced by new opcode, would most reduce the estimated length of the program, where an opcode that occurred with frequency p was expected to require $\log_2(1/p)$ bits. It also considered skip-pairs, that allowed for a slot between the two opcodes being combined. After each new opcode was introduced, the frequencies were recalculated.

Although this approach substantially decreased number of opcodes, gzipping the resulting sequence of opcodes gave a result that was only about slightly better than gzipping the original sequence of opcodes (see ``Custom opcodes'' in Table 4). As implemented, computing the custom opcodes was relatively expensive, but was very inexpensive to decompress. However, given the meager improvements, I decided not to incorporate this technology in the results reported here. Using custom opcodes may be an attractive in situations where gzip compression is not being used (because it is not available on the client or it is too expensive to run on the client).

next up previous
Next: Compressing Sets of Strings Up: Compressing Java Class Files Previous: Encoding Integers

William Pugh