I've rewritten a piece of Javascript in Python 3.7 that decompresses a file compressed in a specific format. The project that I got this from is available here.
The code I've come up with is as close an analog as I could interpret (I'm not the best at Javascript).
def decompress_lz2(data):
global loop_count
lb_len = 0
lb_dist = 0
escape = 0x16
off_input = 0
output = b''
while off_input < len(data):
loop_count += 1
if lb_len:
off_output = len(output) - lb_dist
repeat = max(0, off_output + lb_len - len(output))
chunk = output[off_output:off_output + lb_len - repeat]
output += chunk
if repeat:
repeat_chunk = bytes([chunk[-1]]) * repeat
output += repeat_chunk
lb_len = 0
if escape:
chunk = data[off_input:off_input + escape]
output += chunk
off_input += escape
escape = 0
flag = data[min(off_input, len(data) - 1)]
off_input += 1
lb_len = flag >> 5
if lb_len:
if lb_len == 7:
while True:
next_ = data[off_input]
off_input += 1
lb_len += next_
if next_ != 0xff:
break
lb_len += 2
lb_dist = (flag & 0x1F) << 8
lb_dist += (1 + data[off_input])
off_input += 1
if lb_dist == 0x2000:
lb_dist += (data[off_input] << 8)
off_input += 1
lb_dist += data[off_input]
off_input += 1
else:
escape = flag + 1
return output
where data
is a byte string read in from a file opened in binary mode. My code and the original code both produce the same output, but where the original code takes only a few seconds to execute, mine takes ~10 minutes on the same file. Testing with multiple files yields similar benchmarks. My specific efficiency question is: What can I do to increase the speed of execution of this script on the same system while maintaining output accuracy?
I'm open to the idea of multithreading/multiprocessing though I don't think it's possible due to the nature of this compression type.
Example file, though it's very small and runs quickly on both implementations. It must be fed to decompress_lz2
as bytes
.
1 Answer 1
I can't test it at the moment, but I suspect that the culprit is
output += chunk
in a couple of places. This line potentially has O(n^2) complexity because of copying output
to a new place in memory that has room to append chunk
to it. It would be more efficient to append chunk
to a list and then use b''.join()
at the end to concatenate all the chunks.
def decompress_lz2(data):
global loop_count
lb_len = 0
lb_dist = 0
escape = 0x16
off_input = 0
output = [] # <-- make output a list
while off_input < len(data):
loop_count += 1
if lb_len:
off_output = len(output) - lb_dist
repeat = max(0, off_output + lb_len - len(output))
chunk = output[off_output:off_output + lb_len - repeat]
output.append(chunk) # <-- changed '+=' to '.append()'
if repeat:
repeat_chunk = bytes([chunk[-1]]) * repeat
output.append(repeat_chunk) # <-- changed '+=' to '.append()'
lb_len = 0
if escape:
chunk = data[off_input:off_input + escape]
output.append(chunk) # <-- changed '+=' to '.append()'
off_input += escape
escape = 0
flag = data[min(off_input, len(data) - 1)]
off_input += 1
lb_len = flag >> 5
if lb_len:
if lb_len == 7:
while True:
next_ = data[off_input]
off_input += 1
lb_len += next_
if next_ != 0xff:
break
lb_len += 2
lb_dist = (flag & 0x1F) << 8
lb_dist += (1 + data[off_input])
off_input += 1
if lb_dist == 0x2000:
lb_dist += (data[off_input] << 8)
off_input += 1
lb_dist += data[off_input]
off_input += 1
else:
escape = flag + 1
return b''.join(output) # <-- concatenate the chunks
-
\$\begingroup\$ Ha! I made this change and was about to update this when I saw your answer. I did essentially what you did because I realized that list slicing and then using
+=
on two lists was the inefficiency due to memory copying and the potential for huge reallocations. My new solution gets rid of all the slicing as well as the assignment of new lists so that excessive copying and garbage collection need not happen by iterating through the existing data and output lists to make this possible. I'm marking your answer as the answer because it does essentially the same thing. \$\endgroup\$Stephen– Stephen2020年09月23日 20:20:31 +00:00Commented Sep 23, 2020 at 20:20 -
\$\begingroup\$ @Stephen How much faster is this? From ~10 minutes down to how long? \$\endgroup\$superb rain– superb rain2020年09月23日 21:45:38 +00:00Commented Sep 23, 2020 at 21:45
-
\$\begingroup\$ This version? I haven't actually tested so I make no guarantees but the version that I wrote up took the execution to ~1.5 seconds from ~10 minutes, an insane improvement. Which ultimately brought it into the same ballpark as the javascript implementation. \$\endgroup\$Stephen– Stephen2020年09月23日 21:57:03 +00:00Commented Sep 23, 2020 at 21:57
Explore related questions
See similar questions with these tags.
pip install line-profiler
. Add@profile
decorator to the desired function(in your case it isdecompress_lz2
). In terminal:kernprof -lv -o /tmp/out.lprof filename.py
\$\endgroup\$