I have wrote code that takes an input file text that can be compressed, and should contain ascii values, and then a new file is created with an appended ".lzw" and that file can then be decompressed. I want to know if I can improve this code, e.g. Simplifying buffer read/write bytes using RandomAccessFile.
Compression Class
package lzw;
import java.io.*;
import java.util.*;
public class LZWCompression {
// Define a HashMap and other variables that will be used in the program
public HashMap<String, Integer> dictionary = new HashMap<>();
public int dictSize = 256;
public String str = "";
public byte inputByte;
public byte[] buffer = new byte[3];
public boolean onleft = true;
/**
* Takes in a file name that is uncompressed, and will compress it's file
* contents and append a ".lzw" to the end of the current file name
*
* @param uncompressed - Name of uncompressed file being compressed
* @throws java.io.IOException - File input/output failure
*/
public void compress(String uncompressed) throws IOException {
// Dictionary size limit, builds dictionary
for (int i = 0; i < 256; i++) {
dictionary.put(Character.toString((char) i), i);
}
// Read input uncompress file & Write out compressed file
RandomAccessFile read = new RandomAccessFile(uncompressed, "r");
RandomAccessFile out = new RandomAccessFile(uncompressed.concat(
".lzw"), "rw");
try {
// Reads the First Character from input file into the String
inputByte = read.readByte();
int i = new Byte(inputByte).intValue();
if (i < 0) {
i += 256;
}
char ch = (char) i;
str = "" + ch;
// Reads Character by Character
while (true) {
inputByte = read.readByte();
i = new Byte(inputByte).intValue();
if (i < 0) {
i += 256;
}
System.out.print(i + ", ");
ch = (char) i;
// If str + ch is in the dictionary..
// Set str to str + ch
if (dictionary.containsKey(str + ch)) {
str = str + ch;
} else {
String s12 = to12bit(dictionary.get(str));
// Store the 12 bits into an array and then write it to the
// output file
if (onleft) {
buffer[0] = (byte) Integer.parseInt(
s12.substring(0, 8), 2);
buffer[1] = (byte) Integer.parseInt(
s12.substring(8, 12) + "0000", 2);
} else {
buffer[1] += (byte) Integer.parseInt(
s12.substring(0, 4), 2);
buffer[2] = (byte) Integer.parseInt(
s12.substring(4, 12), 2);
for (int b = 0; b < buffer.length; b++) {
out.writeByte(buffer[b]);
buffer[b] = 0;
}
}
onleft = !onleft;
// Add str + ch to the dictionary
if (dictSize < 4096) {
dictionary.put(str + ch, dictSize++);
}
// Set str to ch
str = "" + ch;
}
}
/**
* Handles input/output file failure by converting 8bit to 12bit
* then storing integers to byte and writing to output file else add
* the buffers to [1] or use buffer[2] then using the length and a
* for loop to output the bytes and then zero out the buffer, note
* this code is similar to above code, which insures bits are stored
*/
} catch (IOException e) {
String str12bit = to12bit(dictionary.get(str));
if (onleft) {
buffer[0] = (byte) Integer.parseInt(str12bit.substring(0, 8), 2);
buffer[1] = (byte) Integer.parseInt(str12bit.substring(8, 12)
+ "0000", 2);
out.writeByte(buffer[0]);
out.writeByte(buffer[1]);
} else {
buffer[1] += (byte) Integer.parseInt(str12bit.substring(0, 4), 2);
buffer[2] = (byte) Integer.parseInt(str12bit.substring(4, 12), 2);
for (int b = 0; b < buffer.length; b++) {
out.writeByte(buffer[b]);
buffer[b] = 0;
}
}
read.close();
out.close();
}
}
/**
* Converts 8 bits to 12 bits
*
* @param i - Integer value
* @return - String value of integer in 12 bit
*/
public String to12bit(int i) {
String str = Integer.toBinaryString(i);
while (str.length() < 12) {
str = "0" + str;
}
return str;
}
/**
* After creating a lzw object scans user input for file to compress, and
* prints out contents of file being compressed along with integer values of
* the characters being compressed, and will return your file name with an
* appended ".lzw"
*
* @param args - The command line arguments
* @throws java.io.IOException - File input/output failure
*/
public static void main(String[] args) throws IOException {
try {
LZWCompression lzw = new LZWCompression();
Scanner input = new Scanner(System.in);
System.out.println("Enter the name of your (input.txt) file.");
String str = input.nextLine();
File file = new File(str);
Scanner fileScanner = new Scanner(file);
String line = "";
while (fileScanner.hasNext()) {
line = fileScanner.nextLine();
System.out.println("Contents of your file being compressed: \n"
+ line);
}
lzw.compress(str);
System.out.println("\nCompression of your file is complete!");
System.out.println("Your new file is named: " + str.concat(".lzw"));
} catch (FileNotFoundException e) {
System.out.println("File was not found!");
}
}
}
Decompression Class
package lzw;
import java.io.*;
import java.util.*;
public class LZWDecompression {
// Define a HashMap and other variables that will be used in the program
public HashMap<Integer, String> dictionary = new HashMap<>();
public String[] Array_char;
public int dictSize = 256;
public int currword;
public int priorword;
public byte[] buffer = new byte[3];
public boolean onleft = true;
/**
* Decompress Method that takes in input, output as a file path Then
* decompress the input to same file as the one passed to compress method
* without loosing any information. In the decompression method it reads in
* 3 bytes of information and write 2 characters corresponding to the bits
* read.
*
* @param input - Name of input file path
* @throws java.io.IOException - File input/output failure
*/
public void LZW_Decompress(String input) throws IOException {
// DictSize builds up to 4k, Array_Char holds these values
Array_char = new String[4096];
for (int i = 0; i < 256; i++) {
dictionary.put(i, Character.toString((char) i));
Array_char[i] = Character.toString((char) i);
}
// Read input as uncompressed file & Write out compressed file
RandomAccessFile in = new RandomAccessFile(input, "r");
RandomAccessFile out = new RandomAccessFile(input.replace(
".lzw", ""), "rw");
try {
// Gets the first word in code and outputs its corresponding char
buffer[0] = in.readByte();
buffer[1] = in.readByte();
priorword = getvalue(buffer[0], buffer[1], onleft);
onleft = !onleft;
out.writeBytes(Array_char[priorword]);
// Reads every 3 bytes and generates corresponding characters
while (true) {
if (onleft) {
buffer[0] = in.readByte();
buffer[1] = in.readByte();
currword = getvalue(buffer[0], buffer[1], onleft);
} else {
buffer[2] = in.readByte();
currword = getvalue(buffer[1], buffer[2], onleft);
}
onleft = !onleft;
if (currword >= dictSize) {
if (dictSize < 4096) {
Array_char[dictSize] = Array_char[priorword]
+ Array_char[priorword].charAt(0);
}
dictSize++;
out.writeBytes(Array_char[priorword]
+ Array_char[priorword].charAt(0));
} else {
if (dictSize < 4096) {
Array_char[dictSize] = Array_char[priorword]
+ Array_char[currword].charAt(0);
}
dictSize++;
out.writeBytes(Array_char[currword]);
}
priorword = currword;
}
} catch (EOFException e) {
in.close();
out.close();
}
}
/**
* Extract the 12 bit key from 2 bytes and gets the integer value of the key
*
* @param b1 - First byte
* @param b2 - Second byte
* @param onleft - True if on left, false if not
* @return - An Integer which holds the value of the key
*/
public int getvalue(byte b1, byte b2, boolean onleft) {
String temp1 = Integer.toBinaryString(b1);
String temp2 = Integer.toBinaryString(b2);
while (temp1.length() < 8) {
temp1 = "0" + temp1;
}
if (temp1.length() == 32) {
temp1 = temp1.substring(24, 32);
}
while (temp2.length() < 8) {
temp2 = "0" + temp2;
}
if (temp2.length() == 32) {
temp2 = temp2.substring(24, 32);
}
if (onleft) {
return Integer.parseInt(temp1 + temp2.substring(0, 4), 2);
} else {
return Integer.parseInt(temp1.substring(4, 8) + temp2, 2);
}
}
/**
* After creating a lzw object scans user input for file to compress, and
* prints out contents of file being compressed along with integer values of
* the characters being compressed, and will return your file name with an
* appended ".lzw"
*
* @param args - The command line arguments
* @throws java.io.IOException - File input/output failure
*/
public static void main(String[] args) throws IOException {
try {
LZWDecompression lzw = new LZWDecompression();
Scanner input = new Scanner(System.in);
System.out.println("Enter the name of your (input.txt.lzw) file.");
String str = input.nextLine();
File file = new File(str);
Scanner fileScanner = new Scanner(file);
String line = "";
while (fileScanner.hasNext()) {
line = fileScanner.nextLine();
System.out.println("Contents of your file being decompressed:\n"
+ line);
}
lzw.LZW_Decompress(str);
System.out.println("Decompression of your file is complete!");
System.out.println("Your new file is named: "
+ str.replace(".lzw", ""));
} catch (FileNotFoundException e) {
System.out.println("File was not found!");
}
}
}
1 Answer 1
StringBuilder
str = "" + ch;
If you find yourself doing a lot of string addition, consider using a StringBuilder
(or StringBuffer
if you need thread support).
builder.setLength(0);
builder.append(ch);
This saves creating a new string object with each new character.
if (dictionary.containsKey(str + ch)) { str = str + ch; } else {
becomes
builder.append(ch);
if (!dictionary.containsKey(builder.toString())) {
which simplifies things.
Use bitwise operations
String s12 = to12bit(dictionary.get(str)); // Store the 12 bits into an array and then write it to the // output file if (onleft) { buffer[0] = (byte) Integer.parseInt( s12.substring(0, 8), 2); buffer[1] = (byte) Integer.parseInt( s12.substring(8, 12) + "0000", 2); } else { buffer[1] += (byte) Integer.parseInt( s12.substring(0, 4), 2); buffer[2] = (byte) Integer.parseInt( s12.substring(4, 12), 2); for (int b = 0; b < buffer.length; b++) { out.writeByte(buffer[b]); buffer[b] = 0; } }
This is clever but more complicated than is necessary.
int compressed = dictionary.get(str));
// Store the 12 bits into an array and then write it to the
// output file
if (onleft) {
buffer[0] = (byte) (compressed & 0xff);
This only includes the last eight bits of the compressed valued. Note that 0xff is the same as binary 11111111. So you AND compressed
and 0xff which leaves all of the last eight bits that were set still set. But it zeroes out bits outside that area and leaves cleared those bits that were already cleared.
buffer[1] = (byte) ((compressed >> 8) << 4);
We right shift eight bits, which clears those bits (what we put in buffer[0]
). Then we left shift four bits, which has the same effect as appending "0000"
does in the original code. This relies on compressed
never being greater than or equal to 4096. Otherwise the conversion to byte
will drop some of the information.
} else {
buffer[1] += (byte) (compressed & 0xf);
This masks out everything but the last four bits.
buffer[2] = (byte) (compressed >> 4);
Remove the four bits that we put in buffer[1]
and put the rest in buffer[2]
. Again, this relies on compressed
never being greater than or equal to 4096. Otherwise the conversion to byte
will drop some of the information.
for (int b = 0; b < buffer.length; b++) {
out.writeByte(buffer[b]);
buffer[b] = 0;
}
}
By using the bitwise operators, we save the entire to12bit
method. We also avoid creating a String
just so that we could use substring
.
You also may want to put this into its own method. Then when you do it again in the catch
block, you could just call the method.
Decompression
Undoing this is more complicated. Because byte
is a signed type in Java, we have to promote it to int
and mask off the excess bits to make the shifts work right. E.g. ((int) b & 255)
will produce the correct eight bit pattern for a byte b
.
public int getvalue(byte b1, byte b2, boolean onleft) {
int value;
if (onleft) {
value = ((int) b1 & 0xFF) + ((((int) b2 & 0xFF) >> 4) << 8);
} else {
value = ((int) b1 & 0xF) + (((int) b2 & 0xFF) << 4);
}
return value;
}
With the second byte, we rotate right four places to clear the four least significant bits and then left eight to align it properly with the other byte. Then for the other half, we can just mask off all but the four least significant bits and add to the third byte shifted four places to the left.
We have to cast to int
and mask each value so that it doesn't get treated as a negative number.
I'm not crazy about the name getvalue
, which I would expect to be a getter. But I kept it for consistency's sake. Consider changing to something like toIntValue
instead, which better reflects what it actually does.
HashMap
tracks its own size
dictionary.put(str + ch, dictSize++);
You don't need a dictSize
variable. You could just say
dictionary.put(str + ch, dictionary.size());
This saves managing the dictSize
variable.
catch
EOFException
Your code runs forever until it encounters an IOException
. Then it assumes that the IOException
is an EOFException
and writes out whatever is waiting to be written. Instead, consider catching just the EOFException
. Then an IOException
will crash the program. Which is what an IOException
thrown in the catch
block would do anyway.
Use buffered I/O
You are using RandomAccessFile
, which works but is unnecessary. You only do sequential operations. You don't use the random access capability at all. You could just use a buffered I/O method.
You could even write your own wrapper for it. Then you could say something like
out.write12bits(compressed);
and let your wrapper handle the details.
Use descriptive variable names if you can
Some variables don't have descriptive names that make sense, so we just call them string
or something like that. But some do.
String str = input.nextLine();
In this case, the String
represents a file name. So call it that.
String filename = input.nextLine();
Use try
with resources
try { LZWCompression lzw = new LZWCompression(); Scanner input = new Scanner(System.in); System.out.println("Enter the name of your (input.txt) file."); String str = input.nextLine(); File file = new File(str); Scanner fileScanner = new Scanner(file); String line = ""; while (fileScanner.hasNext()) { line = fileScanner.nextLine(); System.out.println("Contents of your file being compressed: \n" + line); } lzw.compress(str); System.out.println("\nCompression of your file is complete!"); System.out.println("Your new file is named: " + str.concat(".lzw"));
But the early part of this can't throw a FileNotFoundException
, so it doesn't need to be in the try
block.
LZWCompression lzw = new LZWCompression();
Scanner input = new Scanner(System.in);
System.out.println("Enter the name of your (input.txt) file.");
String filename = input.nextLine();
try (Scanner = new Scanner(new File(filename))) {
while (fileScanner.hasNext()) {
String line = fileScanner.nextLine();
System.out.println("Contents of your file being compressed: \n"
+ line);
}
lzw.compress(filename);
System.out.println("\nCompression of your file is complete!");
System.out.println("Your new file is named: " + filename.concat(".lzw"));
Now the fileScanner
will be managed by the try
statement.
Confirm what we should know
} catch (FileNotFoundException e) { System.out.println("File was not found!"); }
Which file was not found?
} catch (FileNotFoundException e) {
System.out.println("File '" + filename + "' was not found!");
}
Now we know what the program thought it wanted to find.
-
\$\begingroup\$ FIY, this answer was cross-referenced in this Stack Overflow post: Java decompress with lzw \$\endgroup\$rolfl– rolfl2019年10月10日 03:55:38 +00:00Commented Oct 10, 2019 at 3:55
Some versions of LZW compression are copyrighted. Since a couple of year, their owner (Unisys) demands royalties from any company using their algorithm.
\$\endgroup\$