This code is part of the Construct library. Docstrings explain what the code is supposed to do.
from construct.lib.py3compat import byte2int, int2byte, bytes2str, iteratebytes, iterateints
# Map an integer in the inclusive range 0-255 to its string byte representation
_printable = dict((i, ".") for i in range(256))
_printable.update((i, bytes2str(int2byte(i))) for i in range(32, 128))
def hexdump(data, linesize):
r"""
Turns bytes into a unicode string of the format:
>>>print(hexdump(b'0' * 100, 16))
0000 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
0010 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
0020 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
0030 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
0040 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
0050 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
0060 30 30 30 30 0000
"""
prettylines = []
fmt = "%%04X %%-%ds %%s" % (3 * linesize - 1,)
fmtlinesize = 7 + 3*linesize + 3 + linesize
for i in range(0, len(data), linesize):
line = data[i:i+linesize]
hextext = " ".join('%02x' % b for b in iterateints(line))
rawtext = "".join(_printable[b] for b in iterateints(line))
prettylines.append(fmt % (i, str(hextext), str(rawtext)))
if prettylines:
prettylines[-1] = prettylines[-1].ljust(fmtlinesize)
prettylines.append("")
return "\n".join(prettylines)
def hexundump(data, linesize):
r"""
Reverse of ``hexdump()``.
"""
raw = []
fmtlinesize = 7 + 3*linesize + 3 + linesize
for line in data.split("\n"):
bytes = [int2byte(int(s,16)) for s in line[7:7+3*linesize].split()]
raw.extend(bytes)
return b"".join(raw)
some of the code refers to: https://github.com/construct/construct/blob/master/construct/lib/py3compat.py
-
1\$\begingroup\$ Could you write a summary of what this code is meant to do \$\endgroup\$Tolani– Tolani2016年10月09日 22:41:42 +00:00Commented Oct 9, 2016 at 22:41
-
\$\begingroup\$ @Jamal Considering your meta answer meta.codereview.stackexchange.com/questions/1938/… I am surprised that you closed it. \$\endgroup\$vnp– vnp2016年10月09日 23:43:19 +00:00Commented Oct 9, 2016 at 23:43
-
\$\begingroup\$ Both answers there state that they aren't off-topic, but the other answer (that you've linked to) is consistent with what I've done here. \$\endgroup\$Jamal– Jamal2016年10月09日 23:53:52 +00:00Commented Oct 9, 2016 at 23:53
-
\$\begingroup\$ @Jamal The title and the docstring describe the code clearly enough to me. I've reopened the question. \$\endgroup\$200_success– 200_success2016年10月10日 04:05:28 +00:00Commented Oct 10, 2016 at 4:05
-
\$\begingroup\$ @ArekBulski: The docstring looks fine to me, but I think many users here like to have an introduction explaining what the code does and what you would like to get out of the review. You could say, "please read the docstring" or words to that effect. \$\endgroup\$Gareth Rees– Gareth Rees2016年10月10日 12:42:54 +00:00Commented Oct 10, 2016 at 12:42
1 Answer 1
The construct library implements a Python 2/3 compatibility layer. Would it make sense to use a third-party library here like
six
? Even if you have good reasons for using your own compatibility library, it might make sense to use the same interface names assix
, for exampleiterbytes
instead of youriterateints
. This would make it easier to collaborate with Python programmers who are used to thesix
interface.The name
iteratebytes
is imported but not used.There's no need to use
bytes2str
for compatibility here —int2byte(i).decode('ascii')
works in both Python 2 and 3, since you know that32 <= i < 128
.In fact, I doubt that you need a
bytes2str
compatibility shim at all —b.decode('latin1')
works for all byte strings in both Python 2 and 3.The
_printable
data structure is a dictionary mapping numbers between 0 and 255 to their string representation. This would be slightly simpler if implemented as a list (because then there would be no need to store the keys):_printable = [int2byte(i).decode('ascii') if 32 <= i < 128 else '.' for i in range(256)]
I think the
_printable
name could be improved — perhaps_raw_repr
to parallelrawtext
?If it makes sense to cache the raw representation of bytes (as you do in the
_printable
data structure, then it also makes sense to cache the hexadecimal representation:_hex_repr = [format(i, '02X') for i in range(256)]
There does not seem to be any reason to make the docstring an
r
-string."Unicode" has a capital "U".
The doctest in the docstring does not pass.
$ python -mdoctest cr143734.py ********************************************************************** File "cr143734.py", line 11, in cr143734.hexdump Failed example: print(hexdump(b'0' * 100, 16)) Expected: 0000 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000 0010 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000 0020 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000 0030 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000 0040 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000 0050 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000 0060 30 30 30 30 0000 Got: 0000 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000 0010 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000 0020 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000 0030 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000 0040 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000 0050 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000 0060 30 30 30 30 0000 <BLANKLINE> ********************************************************************** 1 items had failures: 1 of 1 in cr143734.hexdump ***Test Failed*** 1 failures.
The computation of
fmtlinesize
is off by one. It uses3*linesize
but the previous line has3 * linesize - 1
.Because of this off-by-one error, the last but one line of the output has an unnecessary extra space.
If there are more than 65535 bytes in
data
, then the formatting goes wrong:FFF0 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000 10000 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
Since the
hexdump
function is given a byte string (and not, say, an iterable or a file object) it is easy to work out the necessary number of digits in the address.The first argument to
hexundump
should have a different name — this is not the data, it's the dump of the data.It seems inelegant for
hexundump
to take alinesize
argument when this can be deduced from the dump.
-
\$\begingroup\$ For disclosure, I am the current maintainer of Construct and I did purge six from dependencies. Since the library already had both compatibility module and six, I just moved what I needed into pycompat and purged six. One less depencency. Import is copy pasted from
__init__
, skip that. Docstrings were not meant as doctests, I have a huge test suite for that. \$\endgroup\$ArekBulski– ArekBulski2016年10月10日 20:49:00 +00:00Commented Oct 10, 2016 at 20:49 -
\$\begingroup\$ What I took:
_printable
made into a list (indexing faster than of dict),_hexprintable
also added, supporting 04X and 08X offsets on left side (larger wont be printed ever),fmtlinesize
removed, and finallylinesize
could be computed but the way library is made, that value is always available. \$\endgroup\$ArekBulski– ArekBulski2016年10月10日 20:53:46 +00:00Commented Oct 10, 2016 at 20:53 -
\$\begingroup\$ @ArekBulski: The point of doctest is not just to use the docstrings as part of your test suite, but also to check that the examples are correct. \$\endgroup\$Gareth Rees– Gareth Rees2016年10月10日 22:01:40 +00:00Commented Oct 10, 2016 at 22:01
Explore related questions
See similar questions with these tags.