compacting _Jv_Utf8Const

Andrew Haley aph@redhat.com
Wed May 5 19:30:00 GMT 2004


Per Bothner writes:
 > _Jv_Utf8Const names take up a fair amount of space. (However, I don't 
 > have numbers on this. Does anyone?) Some of it is "overhead": the 
 > length (2 bytes), hash code (2 bytes), final '0円' (1 byte), and 
 > alignment (0-1 bytes). How about this more compact encoding:
 > 
 > struct _Jv_Utf8Const
 > {
 > unsigned char hash;
 > /* The data length is split into 7-bit chunks. The chunks appear in
 > * low-endian order (because that is easier to generate), with the
 > * final chunk in a byte with a 0 high-order bit, while the preceding
 > * ones have the high-order bit set. */
 > /* unsigned char length[?]; -extra low-order 7-bit chunks as needed */
 > unsigned char length0; /* high-order byte of length of data, in bytes */
 > char data[1];		/* In Utf8 format; no final '0円'. */
 > };
 > 
 > The hash code is reduced to 1 byte, saving one byte, under the 
 > assumption that clashes will be rare. We reduce the length field to a 
 > single byte in all normal cases, saving another byte. We get rid of the 
 > final useless '0円', saving a third byte. And we remove the requirement 
 > for short-alignment, saving on average half a byte.
 > 
 > So the savings would be 3.5 bytes per name. Is that enough to be worth 
 > while?
 > 
 > We also also removing the restriction to maximum 0xFFFF bytes.
 > 
 > Disadvantages: Slightly slower comparisons. More complex code. More 
 > awkward to print out _Jv_Utf8Const from gdb. Broking binary 
 > compatibility. But the biggest is the actual work of changing the code.
 > 
 > There is also the issue whether this change is compatible with the plans 
 > for new ABI.
I can speak to that: changes to _Jv_Utf8Const aren't yet being
considered, so I don't think the new ABI plans affect this. We'd have
to merge from trunk->branch, but we should do that anyway.
 > Finally, one could compress the actual characters, i.e. use a more 
 > compact special-purpose encoding than UTF8. 6 bits per characters, with 
 > some escape mechanism, should be enough, but saving 25% is probably not 
 > enough to justify the complexity.
You rather sound like you don't really want to change it...
Andrew.


More information about the Java mailing list

AltStyle によって変換されたページ (->オリジナル) /