I have created an hex viewer in python, as a timed-challenge by friend. My implementation was in the form of a class, with a separate main file running with argparse
to allow choosing the file (runs with a demo file by default).
I was pretty satisfied with the final result. However, I have used many list comprehensions and mapping to cut up times. How can I improve the code or considerate styling standards? Any other advice regarding the code or the functionality?
The code is divided into 3 files, first one is general utils for the task, second is the main class and the third is the runner:
gen.py
import string
def hexa (num, fill = 2):
return hex(num)[2:].lower().zfill(fill)
def bina (num, fill = 8):
return bin(num)[2:].zfill(fill)
def chunks (arr, size = 1):
return [arr[i: i+size] for i in range(0, len(arr), size)]
def lmap (func, iterable):
return list(map(func, iterable))
hex_digits_chunks = chunks(lmap(hexa, range(16)), 4)
printable_ascii = lmap(ord, string.digits + string.ascii_letters + string.punctuation)
hexview.py
from gen import *
class HexViewer ():
def __init__ (self, file):
self.data = open(file, 'rb').read()
self.hex_data = lmap(hexa, self.data)
self.hex_chunks = chunks(chunks(self.hex_data, 4), 4)
self.ascii_data = [(chr(int(byte, 16)) if int(byte, 16) in printable_ascii else '.') for byte in self.hex_data]
self.ascii_chunks = chunks(self.ascii_data, 16)
self.rows = len(self.hex_chunks)
self.addresses = lmap(lambda o: hexa(o * 16, 8), range(0, self.rows))
def __str__ (self):
table_format = ' {:<15}{:<60}{:<20}\n'
str_rep = ''
str_rep += table_format.format(
'address'.upper(),
' '.join(' '.join(x) for x in hex_digits_chunks),
'ascii'.upper())
str_rep += '\n'
for i in range(self.rows):
str_rep += table_format.format(
self.addresses[i],
' '.join(' '.join(x) for x in self.hex_chunks[i]),
''.join(self.ascii_chunks[i]))
return str_rep
main.py
import traceback
import argparse
from gen import *
from hexview import *
try:
parser = argparse.ArgumentParser(description='Hexadeciaml viewer.')
parser.add_argument('file', type=str, nargs='?', default='demo.exe', help='the file to process')
args = parser.parse_args()
print('\n\n')
print(HexViewer(args.file))
except SystemExit:
pass
except:
traceback.print_exc()
Demo:
C:\...\Hexed> main.py
ADDRESS 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f ASCII
00000000 4d 5a 90 00 03 00 00 00 04 00 00 00 ff ff 00 00 MZ..............
00000010 b8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 ........@.......
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000030 00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 00 ................
00000040 0e 1f ba 0e 00 b4 09 cd 21 b8 01 4c cd 21 54 68 ........!..L.!Th
00000050 69 73 20 70 72 6f 67 72 61 6d 20 63 61 6e 6e 6f is.program.canno
00000060 74 20 62 65 20 72 75 6e 20 69 6e 20 44 4f 53 20 t.be.run.in.DOS.
00000070 6d 6f 64 65 2e 0d 0d 0a 24 00 00 00 00 00 00 00 mode....$.......
00000080 50 45 00 00 4c 01 02 00 00 00 00 00 00 00 00 00 PE..L...........
... and you got the idea ...
-
\$\begingroup\$ Please do not add, remove, or edit code in a question after you've received an answer. The site policy is explained in What to do when someone answers. \$\endgroup\$Mast– Mast ♦2017年01月31日 07:14:32 +00:00Commented Jan 31, 2017 at 7:14
1 Answer 1
General comments
This code is way too complicated for what it is achieving. It's core is located withing Hexviewer.__str__
with a little bit of preprocessing around. There is no need for a class here, a simple function with tiny helpers should suffice.
Also, even though separating concerns between files is a good thing for complex projects, I find it adds complexity for such small task. You also fall into the bad habit of using from <xxx> import *
to try and avoid this complexity.
Lastly, I have a hard time understanding the logic behing your exceptions handling. Nothing in your code explicitly generate a SystemExit
so you can drop this except
clause. Especially if you plan on doing nothing and exit anyway... And the bare except
to just print the exception... is just useless as it is the default behaviour anyway.
Utilities
gen.py
is a terrible name for a file holding general utilities functions; as gen
mostly associates to generate/generation. utils.py
is more common in the Python's world.
In this file, bina
is never used, lmap
should be replaced by list-comprehensions at the calling point, and chunks
very look like the itertools
recipe grouper
.
Also, printable_ascii
being a list is a poor choice knowing that it will be used for existence checking. You should at least use a set
instead, but I would favor using two constants BEGIN_PRINTABLE = 33
and END_PRINTABLE = 126
as all these characters are contiguous in the ASCII table.
File processing
First off, you open the file but never close it: time to get yourself familiar with the with
statement.
Second, instead of pre-processing the file at once and printing it later, you could read it by blocks of 16 bytes and process them before going to the next block. It will save a lot of memory and allow you to view very large files.
Third, instead of building the whole string at once and returning it, you can yield
each block of processed 16 bytes and let the caller be responsible of iterating over them to perform the desired operation (or feed them to '\n'.join
for what it's worth).
Fourth, you don't necessarily need to use hex
or chr
to convert integers to characters before formatting them: format specifiers x
and c
for integers can perform the same operations. And you can also mix them with 0>?
where ?
is an integer to perform the role of zfill
. Example:
>>> '{:0>4x}'.format(23)
'0017'
>>> '{:c}'.format(102)
'f'
Proposed improvements
import itertools
import argparse
BEGIN_PRINTABLES = 33
END_PRINTABLES = 126
def hex_group_formatter(iterable):
chunks = [iter(iterable)] * 4
return ' '.join(
' '.join(format(x, '0>2x') for x in chunk)
for chunk in itertools.zip_longest(*chunks, fillvalue=0))
def ascii_group_formatter(iterable):
return ''.join(
chr(x) if BEGIN_PRINTABLES <= x <= END_PRINTABLES else '.'
for x in iterable)
def hex_viewer(filename, chunk_size=16):
header = hex_group_formatter(range(chunk_size))
yield 'ADDRESS {:<53} ASCII'.format(header)
yield ''
template = '{:0>8x} {:<53} {}'
with open(filename, 'rb') as stream:
for chunk_count in itertools.count(1):
chunk = stream.read(chunk_size)
if not chunk:
return
yield template.format(
chunk_count * chunk_size,
hex_group_formatter(chunk),
ascii_group_formatter(chunk))
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Hexadeciaml viewer.')
parser.add_argument('file', nargs='?', default='demo.exe', help='the file to process')
args = parser.parse_args()
print('\n\n')
for line in hex_viewer(args.file):
print(line)
You may also want to replace the magic number 53
with something that depend of chunk_size
. Given the implementation of hex_group_formatter
, it should be math.ceil(chunk_size/4) * 14 - 3
.