compacting _Jv_Utf8Const
Andrew Haley
aph@redhat.com
Wed May 5 19:30:00 GMT 2004
Per Bothner writes:
> _Jv_Utf8Const names take up a fair amount of space. (However, I don't
> have numbers on this. Does anyone?) Some of it is "overhead": the
> length (2 bytes), hash code (2 bytes), final '0円' (1 byte), and
> alignment (0-1 bytes). How about this more compact encoding:
>
> struct _Jv_Utf8Const
> {
> unsigned char hash;
> /* The data length is split into 7-bit chunks. The chunks appear in
> * low-endian order (because that is easier to generate), with the
> * final chunk in a byte with a 0 high-order bit, while the preceding
> * ones have the high-order bit set. */
> /* unsigned char length[?]; -extra low-order 7-bit chunks as needed */
> unsigned char length0; /* high-order byte of length of data, in bytes */
> char data[1]; /* In Utf8 format; no final '0円'. */
> };
>
> The hash code is reduced to 1 byte, saving one byte, under the
> assumption that clashes will be rare. We reduce the length field to a
> single byte in all normal cases, saving another byte. We get rid of the
> final useless '0円', saving a third byte. And we remove the requirement
> for short-alignment, saving on average half a byte.
>
> So the savings would be 3.5 bytes per name. Is that enough to be worth
> while?
>
> We also also removing the restriction to maximum 0xFFFF bytes.
>
> Disadvantages: Slightly slower comparisons. More complex code. More
> awkward to print out _Jv_Utf8Const from gdb. Broking binary
> compatibility. But the biggest is the actual work of changing the code.
>
> There is also the issue whether this change is compatible with the plans
> for new ABI.
I can speak to that: changes to _Jv_Utf8Const aren't yet being
considered, so I don't think the new ABI plans affect this. We'd have
to merge from trunk->branch, but we should do that anyway.
> Finally, one could compress the actual characters, i.e. use a more
> compact special-purpose encoding than UTF8. 6 bits per characters, with
> some escape mechanism, should be enough, but saving 25% is probably not
> enough to justify the complexity.
You rather sound like you don't really want to change it...
Andrew.
More information about the Java
mailing list