Speed Efficiency of Decompression Algorithm

Question 1

I've rewritten a piece of Javascript in Python 3.7 that decompresses a file compressed in a specific format. The project that I got this from is available here.

The code I've come up with is as close an analog as I could interpret (I'm not the best at Javascript).

def decompress_lz2(data):
 global loop_count
 lb_len = 0
 lb_dist = 0
 escape = 0x16
 off_input = 0
 output = b''
 while off_input < len(data):
 loop_count += 1
 if lb_len:
 off_output = len(output) - lb_dist
 repeat = max(0, off_output + lb_len - len(output))
 chunk = output[off_output:off_output + lb_len - repeat]
 output += chunk
 if repeat:
 repeat_chunk = bytes([chunk[-1]]) * repeat
 output += repeat_chunk
 lb_len = 0
 if escape:
 chunk = data[off_input:off_input + escape]
 output += chunk
 off_input += escape
 escape = 0
 flag = data[min(off_input, len(data) - 1)]
 off_input += 1
 lb_len = flag >> 5
 if lb_len:
 if lb_len == 7:
 while True:
 next_ = data[off_input]
 off_input += 1
 lb_len += next_
 if next_ != 0xff:
 break
 lb_len += 2
 lb_dist = (flag & 0x1F) << 8
 lb_dist += (1 + data[off_input])
 off_input += 1
 if lb_dist == 0x2000:
 lb_dist += (data[off_input] << 8)
 off_input += 1
 lb_dist += data[off_input]
 off_input += 1
 else:
 escape = flag + 1
 return output

where data is a byte string read in from a file opened in binary mode. My code and the original code both produce the same output, but where the original code takes only a few seconds to execute, mine takes ~10 minutes on the same file. Testing with multiple files yields similar benchmarks. My specific efficiency question is: What can I do to increase the speed of execution of this script on the same system while maintaining output accuracy?

I'm open to the idea of multithreading/multiprocessing though I don't think it's possible due to the nature of this compression type.

Example file, though it's very small and runs quickly on both implementations. It must be fed to decompress_lz2 as bytes.

Question 2

Try using a line profiler to find where exactly is the problem. pip install line-profiler. Add @profile decorator to the desired function(in your case it is decompress_lz2). In terminal: kernprof -lv -o /tmp/out.lprof filename.py

Question 3

Maybe, also share some example data so that people can test it on their system

Question 4

The files produced and consumed by this decompression are proprietary. Let me see if I can share an example file.

Question 5

I can't test it at the moment, but I suspect that the culprit is

output += chunk

in a couple of places. This line potentially has O(n^2) complexity because of copying output to a new place in memory that has room to append chunk to it. It would be more efficient to append chunk to a list and then use b''.join() at the end to concatenate all the chunks.

def decompress_lz2(data):
 global loop_count
 lb_len = 0
 lb_dist = 0
 escape = 0x16
 off_input = 0
 output = [] # <-- make output a list
 while off_input < len(data):
 loop_count += 1
 if lb_len:
 off_output = len(output) - lb_dist
 repeat = max(0, off_output + lb_len - len(output))
 chunk = output[off_output:off_output + lb_len - repeat]
 output.append(chunk) # <-- changed '+=' to '.append()'
 if repeat:
 repeat_chunk = bytes([chunk[-1]]) * repeat
 output.append(repeat_chunk) # <-- changed '+=' to '.append()'
 lb_len = 0
 if escape:
 chunk = data[off_input:off_input + escape]
 output.append(chunk) # <-- changed '+=' to '.append()'
 off_input += escape
 escape = 0
 flag = data[min(off_input, len(data) - 1)]
 off_input += 1
 lb_len = flag >> 5
 if lb_len:
 if lb_len == 7:
 while True:
 next_ = data[off_input]
 off_input += 1
 lb_len += next_
 if next_ != 0xff:
 break
 lb_len += 2
 lb_dist = (flag & 0x1F) << 8
 lb_dist += (1 + data[off_input])
 off_input += 1
 if lb_dist == 0x2000:
 lb_dist += (data[off_input] << 8)
 off_input += 1
 lb_dist += data[off_input]
 off_input += 1
 else:
 escape = flag + 1
 return b''.join(output) # <-- concatenate the chunks

Question 6

Ha! I made this change and was about to update this when I saw your answer. I did essentially what you did because I realized that list slicing and then using += on two lists was the inefficiency due to memory copying and the potential for huge reallocations. My new solution gets rid of all the slicing as well as the assignment of new lists so that excessive copying and garbage collection need not happen by iterating through the existing data and output lists to make this possible. I'm marking your answer as the answer because it does essentially the same thing.

Question 7

@Stephen How much faster is this? From ~10 minutes down to how long?

Question 8

This version? I haven't actually tested so I make no guarantees but the version that I wrote up took the execution to ~1.5 seconds from ~10 minutes, an insane improvement. Which ultimately brought it into the same ballpark as the javascript implementation.

RootTwo RootTwo 10.6k1 gold badge14 silver badges30 bronze badges · Accepted Answer · 2020-09-23 19:52:45Z

I can't test it at the moment, but I suspect that the culprit is

output += chunk

in a couple of places. This line potentially has O(n^2) complexity because of copying output to a new place in memory that has room to append chunk to it. It would be more efficient to append chunk to a list and then use b''.join() at the end to concatenate all the chunks.

def decompress_lz2(data):
 global loop_count
 lb_len = 0
 lb_dist = 0
 escape = 0x16
 off_input = 0
 output = [] # <-- make output a list
 while off_input < len(data):
 loop_count += 1
 if lb_len:
 off_output = len(output) - lb_dist
 repeat = max(0, off_output + lb_len - len(output))
 chunk = output[off_output:off_output + lb_len - repeat]
 output.append(chunk) # <-- changed '+=' to '.append()'
 if repeat:
 repeat_chunk = bytes([chunk[-1]]) * repeat
 output.append(repeat_chunk) # <-- changed '+=' to '.append()'
 lb_len = 0
 if escape:
 chunk = data[off_input:off_input + escape]
 output.append(chunk) # <-- changed '+=' to '.append()'
 off_input += escape
 escape = 0
 flag = data[min(off_input, len(data) - 1)]
 off_input += 1
 lb_len = flag >> 5
 if lb_len:
 if lb_len == 7:
 while True:
 next_ = data[off_input]
 off_input += 1
 lb_len += next_
 if next_ != 0xff:
 break
 lb_len += 2
 lb_dist = (flag & 0x1F) << 8
 lb_dist += (1 + data[off_input])
 off_input += 1
 if lb_dist == 0x2000:
 lb_dist += (data[off_input] << 8)
 off_input += 1
 lb_dist += data[off_input]
 off_input += 1
 else:
 escape = flag + 1
 return b''.join(output) # <-- concatenate the chunks

Ha! I made this change and was about to update this when I saw your answer. I did essentially what you did because I realized that list slicing and then using += on two lists was the inefficiency due to memory copying and the potential for huge reallocations. My new solution gets rid of all the slicing as well as the assignment of new lists so that excessive copying and garbage collection need not happen by iterating through the existing data and output lists to make this possible. I'm marking your answer as the answer because it does essentially the same thing.
@Stephen How much faster is this? From ~10 minutes down to how long?
This version? I haven't actually tested so I make no guarantees but the version that I wrote up took the execution to ~1.5 seconds from ~10 minutes, an insane improvement. Which ultimately brought it into the same ballpark as the javascript implementation.

Stack Exchange Network

Speed Efficiency of Decompression Algorithm

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Speed Efficiency of Decompression Algorithm

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions