I'm looking for feedback/improvement on code that I wrote. It's a function that scans a file for zlib headers and returns the header, offset, and decompressed data. In main, below the function, I use that data to find the offset beginning after the final zlib compressed data which I need that offset. I'm looking to improve by making it more compact and efficient, specifically. Any advice/improvements would be greatly appreciated. Here's my code:
import zlib
def inflate(infile):
data = infile.read()
offset = 0
while offset < len(data):
window = data[offset : offset + 2]
for key, value in zlib_headers.items():
if window == key:
decomp_obj = zlib.decompressobj()
yield key, offset, decomp_obj.decompress(data[offset:])
if offset == len(data):
break
offset += 1
if __name__ == "__main__":
zlib_headers = {b"\x78\x01": 3, b"\x78\x9c": 6, b"\x78\xda": 9}
with open("input_file", "rb") as infile:
*_, last = inflate(infile)
key, offset, data = last
start_offset = offset + len(zlib.compress(data, zlib_headers[key]))
print(start_offset)
1 Answer 1
It works, it's reasonably clear in that you use a library. Mostly it's fine.
Give inflate
a docstring which clearly says what it takes in, what it returns. Rename it, because it doesn't inflate one thing that's passed in.
Switch the loop to be simply for offset in range(len(data)):
You may need to deal with errors thrown by the library when the zlib 2-byte header randomly appears inside a file.
zlib_headers
is a global. Move it out of the if __name__ == '__main__':
block and make it UPPERCASE. Remove the values (3, 6, 9) which are never used, make it a set
. Or it would be reasonable to make it a regex.