Decoding a histogram

Question 1

I'm writing this as a follow-up for my previous question posted here:

I've successfully decoded a large histogram into a byte array, which is written to a file. I'm now focusing on returning the byte array back to the token-based String. Details on how the tokens work are in the previous example.

I've created a method, below, which takes the byte array as read from the file, and outputs a char array. Due to the unknown size of the output as this point, i'm using a StringBuilder to append the result of the byte array. The DecodingResult class is just a simple POJO with the output String as a char[] and the size of the histogram as an int.

 /** Decodes the raw byte into a decoding result object.
 * @param bytes bytes to decode
 * @return decodingResult object
 */
public static DecodingResult decodeBinarySPECtoRAW(byte[] bytes) {
 StringBuilder sb = new StringBuilder();
 int height = 0;
 int length = 0;
 int val;
 int histogramLength = 0;
 for (int i = 0; i < bytes.length; i++) {
 char token = (char) bytes[i];
 sb.append(token);
 boolean nonSpecial = false;
 for (Token t : Token.values()) {
 if (token == t.name().charAt(0)) {
 nonSpecial = true;
 height = t.getHeight();
 length = t.getLength();
 }
 }
 if (nonSpecial) {
 //length
 if (length != 0 && length != 1) {
 if (length == 8) {
 //1 byte
 sb.append(getPaddedString(String.valueOf(bytes[i + 1] & 0xFF), 3));
 histogramLength += bytes[i + 1] & 0xFF;
 i++;
 } else if (length == 16) {
 //2 bytes
 val = Tools.convertFromByteArray2(bytes[i + 1], bytes[i + 2]);
 histogramLength += val;
 sb.append(getPaddedString(String.valueOf(val), 5));
 i += 2;
 } else {
 //4 bytes
 val = Tools.convertFromByteArray4(bytes[i + 1], bytes[i + 2], bytes[i + 3], bytes[i + 4]);
 histogramLength += val;
 sb.append(getPaddedString(String.valueOf(val), 10));
 i += 4;
 }
 } else {
 histogramLength++;
 }
 //height
 if (height != 0 && height != 1) {
 if (height == 8) {
 //1 byte
 sb.append(getPaddedString(String.valueOf(bytes[i + 1] & 0xFF), 3));
 i++;
 } else if (height == 16) {
 //2 bytes
 val = Tools.convertFromByteArray2(bytes[i + 1], bytes[i + 2]);
 sb.append(getPaddedString(String.valueOf(val), 5));
 i += 2;
 } else {
 //4 bytes
 val = Tools.convertFromByteArray4(bytes[i + 1], bytes[i + 2], bytes[i + 3], bytes[i + 4]);
 sb.append(getPaddedString(String.valueOf(val), 10));
 i += 4;
 }
 }
 } else {
 switch (token) {
 case 'R': {
 int numReads = (int) bytes[i + 1] & 0xFF;
 i++;
 sb.append(getPaddedString(String.valueOf(numReads), 3));
 for (int j = 0; j < numReads; j++) {
 int nextNum = bytes[i + 1] & 0xFF;
 sb.append(getPaddedString(String.valueOf(nextNum), 3));
 histogramLength++;
 i++;
 }
 break;
 }
 case 'S': {
 int numReads = (int) bytes[i + 1] & 0xFF;
 i++;
 sb.append(getPaddedString(String.valueOf(numReads), 3));
 for (int j = 0; j < numReads; j++) {
 histogramLength++;
 sb.append(getPaddedString(String.valueOf(Tools.convertFromByteArray2(bytes[i + 1], bytes[i + 2])), 5));
 i += 2;
 }
 break;
 }
 case 'T': {
 sb.append(getPaddedString(String.valueOf(Tools.convertFromByteArray4(bytes[i + 1], bytes[i + 2], bytes[i + 3], bytes[i + 4])), 10));
 i += 4;
 sb.append(getPaddedString(String.valueOf(Tools.convertFromByteArray4(bytes[i + 1], bytes[i + 2], bytes[i + 3], bytes[i + 4])), 10));
 i += 4;
 break;
 }
 case 'U': {
 sb.append(getPaddedString(String.valueOf(Tools.convertFromByteArray4(bytes[i + 1], bytes[i + 2], bytes[i + 3], bytes[i + 4])), 10));
 i += 4;
 break;
 }
 case 'V': {
 List<Byte> VBytes = new ArrayList<>();
 boolean escapeFound = false;
 while (!escapeFound) {
 if (i + 1 < bytes.length) {
 if (bytes[i + 1] == 0) escapeFound = true;
 else {
 VBytes.add(bytes[i + 1]);
 i += 1;
 }
 }
 }
 for (byte b : VBytes) {
 sb.append((char) b);
 }
 sb.append(getPaddedString(String.valueOf(bytes[i + 1] & 0xFF), 3));
 i += 1;
 break;
 }
 case 'W': {
 for (int j = 0; j < 6; j++) {
 sb.append("000");
 i += 1;
 }
 sb.append(getPaddedString(String.valueOf(Tools.convertFromByteArray4(bytes[i + 1], bytes[i + 2], bytes[i + 3], bytes[i + 4])), 10));
 i += 4;
 break;
 }
 case 'X': {
 sb.append(getPaddedString(String.valueOf(bytes[i + 1] & 0xFF), 3));
 i += 1;
 //get length of the statement
 int statementLength = bytes[i + 1] & 0xFF;
 sb.append(getPaddedString(String.valueOf(statementLength), 3));
 i += 1;
 for (int j = i + 1; j < i + 1 + statementLength; j++) {
 sb.append((char) bytes[j]);
 }
 i += statementLength;
 sb.append(getPaddedString(String.valueOf(Tools.convertFromByteArray2(bytes[i + 1], bytes[i + 2])), 5));
 i += 2;
 //endseq
 int endLength = bytes[i + 1];
 sb.append(getPaddedString(String.valueOf(endLength), 3));
 i += 1;
 if (endLength != 0) {
 for (int j = i + 1; j < i + 1 + endLength; j++) {
 sb.append((char) bytes[j]);
 }
 i += endLength;
 }
 //flankseq
 int flankLength = bytes[i + 1];
 sb.append(getPaddedString(String.valueOf(flankLength), 3));
 i += 1;
 if (flankLength != 0) {
 for (int j = i + 1; j < i + 1 + flankLength; j++) {
 sb.append((char) bytes[j]);
 }
 i += flankLength;
 }
 break;
 }
 case 'Y': {
 //must be Y
 sb.append(getPaddedString(String.valueOf(Tools.convertFromByteArray4(bytes[i + 1], bytes[i + 2], bytes[i + 3], bytes[i + 4])), 10));
 i += 4;
 break;
 }
 }
 }
 }
 return new DecodingResult(sb.toString().toCharArray(), histogramLength);
}
public static String getPaddedString(String s, int max){
 StringBuilder b = new StringBuilder(max);
 for(int i = 0; i < max - s.length(); i++){
 b.append('0');
 }
 b.append(s);
 return b.toString();
}

The token code, just so no one has to go back and forth to the last post:

/** All lengths and heights in bits.
 * All 1's are to be ignored in writing
 * i.e 1 - 0 is transcoded as A.
 * 1 -1 is transcoded as E
 * 1 - 209 is transcoded as I209
 * 1 - 2 is transcoded as I002
 * 1 - 40000 is transcoded as M40000
 * 1 - 290 is transcoded as M00290
 */
 public enum Token {
A (1, 0),
B (8, 0),
I (1 ,8),
E (1, 1),
F (8, 1),
J (8, 8),
N (8,16),
M (1,16),
C (16,0),
D (32,0),
G (16,1),
H (32,1),
K (16,8),
L (32,8),
O (16,16),
P (32,16),
Q (16,32),
Z (1,32);
private final int length;
private final int height;
Token(int length, int height) {
 this.length = length;
 this.height = height;
}
public int getLength() {
 return length;
}
public int getHeight() {
 return height;
}

}

Also the convertFromByteArray code.

public static int convertFromByteArray2(byte byte1, byte byte2){
 return ((byte2 & 0xFF) << 8 | (byte1 & 0xFF));
}
public static int convertFromByteArray4(byte byte1, byte byte2, byte byte3, byte byte4){
 return byte1 << 24 | (byte2 & 0xFF) << 16 | (byte3 & 0xFF) << 8 | (byte4 & 0xFF);
}

There are two processes here that are repeated, and are taking a lot of time but i'm not sure if there's a better way of doing it. The first is the byte -> string process. I have created a padding method to remove the use of String.format

Secondly, It's having to loop through every Token in the list to find the corresponding correct one.

Question 2

I thought you were told about the cost of String.format()

Question 3

@SharonBenAsher This new code avoids String.format().

Question 4

The loop can simply be replaced with a lookup map which you prepare once before the process runs:

Map<Character, Token> tokenLookup = EnumSet.allOf(Token.class).stream()
 .collect(Collectors.toMap(tok -> tok.name().charAt(0), Function.identity()));

Then, instead of the loop just:

Token t = tokenLookup.get(token);
if(t != null) {
 nonSpecial = true;
 height = t.getHeight();
 length = t.getLength();
}

Regarding the getPaddedString() method: you could at least eliminate the repeated call to s.length() for every loop operation:

for(int i = max - s.length(); i > 0; i--)
 ...

Question 5

Thanks for the suggestions. Thats definitely a better way that to do it than iterating through the Tokens.

mtj mtj 4,96211 silver badges21 bronze badges · Answer 1 · 2018-04-18 08:04:27Z

The loop can simply be replaced with a lookup map which you prepare once before the process runs:

Map<Character, Token> tokenLookup = EnumSet.allOf(Token.class).stream()
 .collect(Collectors.toMap(tok -> tok.name().charAt(0), Function.identity()));

Then, instead of the loop just:

Token t = tokenLookup.get(token);
if(t != null) {
 nonSpecial = true;
 height = t.getHeight();
 length = t.getLength();
}

Regarding the getPaddedString() method: you could at least eliminate the repeated call to s.length() for every loop operation:

for(int i = max - s.length(); i > 0; i--)
 ...

Thanks for the suggestions. Thats definitely a better way that to do it than iterating through the Tokens.

Stack Exchange Network

Decoding a histogram

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Decoding a histogram

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions