Python Hex Viewer

Question 1

I have created an hex viewer in python, as a timed-challenge by friend. My implementation was in the form of a class, with a separate main file running with argparse to allow choosing the file (runs with a demo file by default).

I was pretty satisfied with the final result. However, I have used many list comprehensions and mapping to cut up times. How can I improve the code or considerate styling standards? Any other advice regarding the code or the functionality?

The code is divided into 3 files, first one is general utils for the task, second is the main class and the third is the runner:

gen.py

import string
def hexa (num, fill = 2):
 return hex(num)[2:].lower().zfill(fill)
def bina (num, fill = 8):
 return bin(num)[2:].zfill(fill)
def chunks (arr, size = 1):
 return [arr[i: i+size] for i in range(0, len(arr), size)]
def lmap (func, iterable):
 return list(map(func, iterable))
hex_digits_chunks = chunks(lmap(hexa, range(16)), 4)
printable_ascii = lmap(ord, string.digits + string.ascii_letters + string.punctuation)

hexview.py

from gen import *
class HexViewer ():
 def __init__ (self, file):
 self.data = open(file, 'rb').read()
 self.hex_data = lmap(hexa, self.data)
 self.hex_chunks = chunks(chunks(self.hex_data, 4), 4)
 self.ascii_data = [(chr(int(byte, 16)) if int(byte, 16) in printable_ascii else '.') for byte in self.hex_data]
 self.ascii_chunks = chunks(self.ascii_data, 16)
 self.rows = len(self.hex_chunks)
 self.addresses = lmap(lambda o: hexa(o * 16, 8), range(0, self.rows))
 def __str__ (self):
 table_format = ' {:<15}{:<60}{:<20}\n'
 str_rep = ''
 str_rep += table_format.format(
 'address'.upper(),
 ' '.join(' '.join(x) for x in hex_digits_chunks),
 'ascii'.upper())
 str_rep += '\n'
 for i in range(self.rows):
 str_rep += table_format.format(
 self.addresses[i], 
 ' '.join(' '.join(x) for x in self.hex_chunks[i]),
 ''.join(self.ascii_chunks[i]))
 return str_rep

main.py

import traceback
import argparse
from gen import *
from hexview import *
try:
 parser = argparse.ArgumentParser(description='Hexadeciaml viewer.')
 parser.add_argument('file', type=str, nargs='?', default='demo.exe', help='the file to process')
 args = parser.parse_args()
 print('\n\n')
 print(HexViewer(args.file))
except SystemExit:
 pass
except:
 traceback.print_exc()

Demo:

C:\...\Hexed> main.py
ADDRESS 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f ASCII
00000000 4d 5a 90 00 03 00 00 00 04 00 00 00 ff ff 00 00 MZ..............
00000010 b8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 ........@.......
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000030 00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 00 ................
00000040 0e 1f ba 0e 00 b4 09 cd 21 b8 01 4c cd 21 54 68 ........!..L.!Th
00000050 69 73 20 70 72 6f 67 72 61 6d 20 63 61 6e 6e 6f is.program.canno
00000060 74 20 62 65 20 72 75 6e 20 69 6e 20 44 4f 53 20 t.be.run.in.DOS.
00000070 6d 6f 64 65 2e 0d 0d 0a 24 00 00 00 00 00 00 00 mode....$.......
00000080 50 45 00 00 4c 01 02 00 00 00 00 00 00 00 00 00 PE..L...........
... and you got the idea ...

Question 2

Please do not add, remove, or edit code in a question after you've received an answer. The site policy is explained in What to do when someone answers.

Question 3

General comments

This code is way too complicated for what it is achieving. It's core is located withing Hexviewer.__str__ with a little bit of preprocessing around. There is no need for a class here, a simple function with tiny helpers should suffice.

Also, even though separating concerns between files is a good thing for complex projects, I find it adds complexity for such small task. You also fall into the bad habit of using from <xxx> import * to try and avoid this complexity.

Lastly, I have a hard time understanding the logic behing your exceptions handling. Nothing in your code explicitly generate a SystemExit so you can drop this except clause. Especially if you plan on doing nothing and exit anyway... And the bare except to just print the exception... is just useless as it is the default behaviour anyway.

Utilities

gen.py is a terrible name for a file holding general utilities functions; as gen mostly associates to generate/generation. utils.py is more common in the Python's world.

In this file, bina is never used, lmap should be replaced by list-comprehensions at the calling point, and chunks very look like the itertools recipe grouper.

Also, printable_ascii being a list is a poor choice knowing that it will be used for existence checking. You should at least use a set instead, but I would favor using two constants BEGIN_PRINTABLE = 33 and END_PRINTABLE = 126 as all these characters are contiguous in the ASCII table.

File processing

First off, you open the file but never close it: time to get yourself familiar with the with statement.

Second, instead of pre-processing the file at once and printing it later, you could read it by blocks of 16 bytes and process them before going to the next block. It will save a lot of memory and allow you to view very large files.

Third, instead of building the whole string at once and returning it, you can yield each block of processed 16 bytes and let the caller be responsible of iterating over them to perform the desired operation (or feed them to '\n'.join for what it's worth).

Fourth, you don't necessarily need to use hex or chr to convert integers to characters before formatting them: format specifiers x and c for integers can perform the same operations. And you can also mix them with 0>? where ? is an integer to perform the role of zfill. Example:

>>> '{:0>4x}'.format(23)
'0017'
>>> '{:c}'.format(102)
'f'

Proposed improvements

import itertools
import argparse
BEGIN_PRINTABLES = 33
END_PRINTABLES = 126
def hex_group_formatter(iterable):
 chunks = [iter(iterable)] * 4
 return ' '.join(
 ' '.join(format(x, '0>2x') for x in chunk)
 for chunk in itertools.zip_longest(*chunks, fillvalue=0))
def ascii_group_formatter(iterable):
 return ''.join(
 chr(x) if BEGIN_PRINTABLES <= x <= END_PRINTABLES else '.'
 for x in iterable)
def hex_viewer(filename, chunk_size=16):
 header = hex_group_formatter(range(chunk_size))
 yield 'ADDRESS {:<53} ASCII'.format(header)
 yield ''
 template = '{:0>8x} {:<53} {}'
 with open(filename, 'rb') as stream:
 for chunk_count in itertools.count(1):
 chunk = stream.read(chunk_size)
 if not chunk:
 return
 yield template.format(
 chunk_count * chunk_size,
 hex_group_formatter(chunk),
 ascii_group_formatter(chunk))
if __name__ == '__main__':
 parser = argparse.ArgumentParser(description='Hexadeciaml viewer.')
 parser.add_argument('file', nargs='?', default='demo.exe', help='the file to process')
 args = parser.parse_args()
 print('\n\n')
 for line in hex_viewer(args.file):
 print(line)

You may also want to replace the magic number 53 with something that depend of chunk_size. Given the implementation of hex_group_formatter, it should be math.ceil(chunk_size/4) * 14 - 3.

score 7 · Accepted Answer · 2016-11-16 22:32:28Z

General comments

This code is way too complicated for what it is achieving. It's core is located withing Hexviewer.__str__ with a little bit of preprocessing around. There is no need for a class here, a simple function with tiny helpers should suffice.

Also, even though separating concerns between files is a good thing for complex projects, I find it adds complexity for such small task. You also fall into the bad habit of using from <xxx> import * to try and avoid this complexity.

Lastly, I have a hard time understanding the logic behing your exceptions handling. Nothing in your code explicitly generate a SystemExit so you can drop this except clause. Especially if you plan on doing nothing and exit anyway... And the bare except to just print the exception... is just useless as it is the default behaviour anyway.

Utilities

gen.py is a terrible name for a file holding general utilities functions; as gen mostly associates to generate/generation. utils.py is more common in the Python's world.

In this file, bina is never used, lmap should be replaced by list-comprehensions at the calling point, and chunks very look like the itertools recipe grouper.

Also, printable_ascii being a list is a poor choice knowing that it will be used for existence checking. You should at least use a set instead, but I would favor using two constants BEGIN_PRINTABLE = 33 and END_PRINTABLE = 126 as all these characters are contiguous in the ASCII table.

File processing

First off, you open the file but never close it: time to get yourself familiar with the with statement.

Second, instead of pre-processing the file at once and printing it later, you could read it by blocks of 16 bytes and process them before going to the next block. It will save a lot of memory and allow you to view very large files.

Third, instead of building the whole string at once and returning it, you can yield each block of processed 16 bytes and let the caller be responsible of iterating over them to perform the desired operation (or feed them to '\n'.join for what it's worth).

Fourth, you don't necessarily need to use hex or chr to convert integers to characters before formatting them: format specifiers x and c for integers can perform the same operations. And you can also mix them with 0>? where ? is an integer to perform the role of zfill. Example:

>>> '{:0>4x}'.format(23)
'0017'
>>> '{:c}'.format(102)
'f'

Proposed improvements

import itertools
import argparse
BEGIN_PRINTABLES = 33
END_PRINTABLES = 126
def hex_group_formatter(iterable):
 chunks = [iter(iterable)] * 4
 return ' '.join(
 ' '.join(format(x, '0>2x') for x in chunk)
 for chunk in itertools.zip_longest(*chunks, fillvalue=0))
def ascii_group_formatter(iterable):
 return ''.join(
 chr(x) if BEGIN_PRINTABLES <= x <= END_PRINTABLES else '.'
 for x in iterable)
def hex_viewer(filename, chunk_size=16):
 header = hex_group_formatter(range(chunk_size))
 yield 'ADDRESS {:<53} ASCII'.format(header)
 yield ''
 template = '{:0>8x} {:<53} {}'
 with open(filename, 'rb') as stream:
 for chunk_count in itertools.count(1):
 chunk = stream.read(chunk_size)
 if not chunk:
 return
 yield template.format(
 chunk_count * chunk_size,
 hex_group_formatter(chunk),
 ascii_group_formatter(chunk))
if __name__ == '__main__':
 parser = argparse.ArgumentParser(description='Hexadeciaml viewer.')
 parser.add_argument('file', nargs='?', default='demo.exe', help='the file to process')
 args = parser.parse_args()
 print('\n\n')
 for line in hex_viewer(args.file):
 print(line)

You may also want to replace the magic number 53 with something that depend of chunk_size. Given the implementation of hex_group_formatter, it should be math.ceil(chunk_size/4) * 14 - 3.

Stack Exchange Network

Python Hex Viewer

1 Answer 1

General comments

Utilities

File processing

Proposed improvements

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Python Hex Viewer

1 Answer 1

General comments

Utilities

File processing

Proposed improvements

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions