next
up
previous
Next: Compressing Sets of Strings
Up: Compressing Java Class Files
Previous: Encoding Integers
Subsections
Compressing Bytecodes
Table 4:
Compression for bytecode components
Benchmark programs
Compression for
javac
mpegaudio
Bytestream
48%
43%
Opcodes
36%
17%
using Stack State
35%
15%
using Custom opcodes
34%
13%
Register numbers
39%
34%
Branch offsets
41%
52%
Method references
35%
28%
Java bytecode sequences are a mixture of
opcodes, integer constants, register numbers, constant pool references
and branch offsets.
As has been suggested previously [EEF+97], we might be able to achieve better compression
if we separate that information into separate streams and compressed them independently.
Of course, note the ``we might''. It isn't guaranteed. For example,
local 0 is initially (and generally throughout a method) used to
store this (for non-static methods). There are some instruction patterns
that depend on the registers and other values in the bytecode sequence. For example,
an aload instruction is much more commonly followed by a getfield instruction
when the aload instruction loads local 0. As it turns out, we would pick
this up even though bytecodes are separated, because a special opcode
is used for loading a reference from local 0 (aload_0).
When we separate out the operands from the opcodes, we don't separate
out the implicit operands in opcodes such as iconst_2 and aload_0.
Table 4 shows
sample compression factors for bytecodes, and for various components
of bytecodes.
As you can see, we get substantially better compression factors
for a sequence of opcodes than for a sequence of bytecodes.
In some unusual cases, such as mpegaudio, we get absolutely incredible
compression ratios. The other sequences don't always compress as well,
but the overall effect is a substantial win.
Approximate Stack State
I also performed a calculation of the current stack
state (a computation of the number and types of values on the stack
before executing each instruction).
This stack information was used to collapse opcodes. For example,
if we know the type of the element on the top of the stack, we
can collapse all four addition opcodes into the
iadd instruction,
and regenerate the correct opcode upon decompression.
No backwards branches were considered, and I only remembered the stack
state over one forward branch at any one time (because the decompressor
has to duplicate this computation, it would be impossible to consider
backward branches).
Because of these
limitations, the calculation was an approximation: sometimes,
the system would not know what state the stack was in.
The improvements realized by
this optimization are modest, as seen in
Table
4, but not expensive to computing
while
compressing or decompressing. Computing the stack information is also
useful in compression references (§
5).
I have incorporated this optimization into
my baseline results.
I tried a custom opcode approach to compressing JVM opcodes [
EEF+97,
FP95].
The program looked
for pairs of adjacent opcodes, that, if replaced by new opcode,
would most reduce the estimated length of the program, where an
opcode that occurred with frequency
p was expected to
require
$\log_2(1/p)$
bits. It also considered skip-pairs, that allowed
for a slot between the two opcodes being combined. After each new opcode
was introduced, the frequencies were recalculated.
Although this approach substantially decreased number of opcodes, gzipping
the resulting sequence of opcodes gave a result that was only about slightly
better
than gzipping the original sequence of opcodes (see ``Custom opcodes''
in Table 4). As implemented, computing
the custom opcodes was relatively expensive, but was very inexpensive
to decompress. However, given the meager improvements,
I decided not to incorporate this technology in the results reported
here. Using custom opcodes may be an attractive in situations where gzip
compression is not being used (because it is not available on the client or
it is too expensive to run on the client).
next
up
previous
Next: Compressing Sets of Strings
Up: Compressing Java Class Files
Previous: Encoding Integers
William Pugh