0
\$\begingroup\$

This code is part of the Construct library. Docstrings explain what the code is supposed to do.

from construct.lib.py3compat import byte2int, int2byte, bytes2str, iteratebytes, iterateints
# Map an integer in the inclusive range 0-255 to its string byte representation
_printable = dict((i, ".") for i in range(256))
_printable.update((i, bytes2str(int2byte(i))) for i in range(32, 128))
def hexdump(data, linesize):
 r"""
 Turns bytes into a unicode string of the format:
 >>>print(hexdump(b'0' * 100, 16))
 0000 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
 0010 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
 0020 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
 0030 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
 0040 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
 0050 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
 0060 30 30 30 30 0000 
 """
 prettylines = []
 fmt = "%%04X %%-%ds %%s" % (3 * linesize - 1,)
 fmtlinesize = 7 + 3*linesize + 3 + linesize
 for i in range(0, len(data), linesize):
 line = data[i:i+linesize]
 hextext = " ".join('%02x' % b for b in iterateints(line))
 rawtext = "".join(_printable[b] for b in iterateints(line))
 prettylines.append(fmt % (i, str(hextext), str(rawtext)))
 if prettylines:
 prettylines[-1] = prettylines[-1].ljust(fmtlinesize)
 prettylines.append("")
 return "\n".join(prettylines)
def hexundump(data, linesize):
 r"""
 Reverse of ``hexdump()``.
 """
 raw = []
 fmtlinesize = 7 + 3*linesize + 3 + linesize
 for line in data.split("\n"):
 bytes = [int2byte(int(s,16)) for s in line[7:7+3*linesize].split()]
 raw.extend(bytes)
 return b"".join(raw)

some of the code refers to: https://github.com/construct/construct/blob/master/construct/lib/py3compat.py

asked Oct 9, 2016 at 21:57
\$\endgroup\$
5
  • 1
    \$\begingroup\$ Could you write a summary of what this code is meant to do \$\endgroup\$ Commented Oct 9, 2016 at 22:41
  • \$\begingroup\$ @Jamal Considering your meta answer meta.codereview.stackexchange.com/questions/1938/… I am surprised that you closed it. \$\endgroup\$ Commented Oct 9, 2016 at 23:43
  • \$\begingroup\$ Both answers there state that they aren't off-topic, but the other answer (that you've linked to) is consistent with what I've done here. \$\endgroup\$ Commented Oct 9, 2016 at 23:53
  • \$\begingroup\$ @Jamal The title and the docstring describe the code clearly enough to me. I've reopened the question. \$\endgroup\$ Commented Oct 10, 2016 at 4:05
  • \$\begingroup\$ @ArekBulski: The docstring looks fine to me, but I think many users here like to have an introduction explaining what the code does and what you would like to get out of the review. You could say, "please read the docstring" or words to that effect. \$\endgroup\$ Commented Oct 10, 2016 at 12:42

1 Answer 1

3
\$\begingroup\$
  1. The construct library implements a Python 2/3 compatibility layer. Would it make sense to use a third-party library here like six? Even if you have good reasons for using your own compatibility library, it might make sense to use the same interface names as six, for example iterbytes instead of your iterateints. This would make it easier to collaborate with Python programmers who are used to the six interface.

  2. The name iteratebytes is imported but not used.

  3. There's no need to use bytes2str for compatibility here — int2byte(i).decode('ascii') works in both Python 2 and 3, since you know that 32 <= i < 128.

  4. In fact, I doubt that you need a bytes2str compatibility shim at all — b.decode('latin1') works for all byte strings in both Python 2 and 3.

  5. The _printable data structure is a dictionary mapping numbers between 0 and 255 to their string representation. This would be slightly simpler if implemented as a list (because then there would be no need to store the keys):

    _printable = [int2byte(i).decode('ascii') if 32 <= i < 128 else '.'
     for i in range(256)]
    
  6. I think the _printable name could be improved — perhaps _raw_repr to parallel rawtext?

  7. If it makes sense to cache the raw representation of bytes (as you do in the _printable data structure, then it also makes sense to cache the hexadecimal representation:

    _hex_repr = [format(i, '02X') for i in range(256)]
    
  8. There does not seem to be any reason to make the docstring an r-string.

  9. "Unicode" has a capital "U".

  10. The doctest in the docstring does not pass.

    $ python -mdoctest cr143734.py
    **********************************************************************
    File "cr143734.py", line 11, in cr143734.hexdump
    Failed example:
     print(hexdump(b'0' * 100, 16))
    Expected:
     0000 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
     0010 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
     0020 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
     0030 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
     0040 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
     0050 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
     0060 30 30 30 30 0000 
    Got:
     0000 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
     0010 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
     0020 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
     0030 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
     0040 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
     0050 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
     0060 30 30 30 30 0000 
     <BLANKLINE>
    **********************************************************************
    1 items had failures:
     1 of 1 in cr143734.hexdump
    ***Test Failed*** 1 failures.
    
  11. The computation of fmtlinesize is off by one. It uses 3*linesize but the previous line has 3 * linesize - 1.

  12. Because of this off-by-one error, the last but one line of the output has an unnecessary extra space.

  13. If there are more than 65535 bytes in data, then the formatting goes wrong:

    FFF0 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
    10000 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0000000000000000
    

    Since the hexdump function is given a byte string (and not, say, an iterable or a file object) it is easy to work out the necessary number of digits in the address.

  14. The first argument to hexundump should have a different name — this is not the data, it's the dump of the data.

  15. It seems inelegant for hexundump to take a linesize argument when this can be deduced from the dump.

answered Oct 10, 2016 at 14:36
\$\endgroup\$
3
  • \$\begingroup\$ For disclosure, I am the current maintainer of Construct and I did purge six from dependencies. Since the library already had both compatibility module and six, I just moved what I needed into pycompat and purged six. One less depencency. Import is copy pasted from __init__, skip that. Docstrings were not meant as doctests, I have a huge test suite for that. \$\endgroup\$ Commented Oct 10, 2016 at 20:49
  • \$\begingroup\$ What I took: _printable made into a list (indexing faster than of dict), _hexprintable also added, supporting 04X and 08X offsets on left side (larger wont be printed ever), fmtlinesize removed, and finally linesize could be computed but the way library is made, that value is always available. \$\endgroup\$ Commented Oct 10, 2016 at 20:53
  • \$\begingroup\$ @ArekBulski: The point of doctest is not just to use the docstrings as part of your test suite, but also to check that the examples are correct. \$\endgroup\$ Commented Oct 10, 2016 at 22:01

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.