3
\$\begingroup\$

I need a function that hashes a list of different inputs (integral datatypes, strings, bytearrays) in a recursive manner.

def RecHash(v): 
 isSingleElementOfList = False
 if type(v) is list and len(v) == 1:
 isSingleElementList = True
 if type(v) is not list or isSingleElementOfList: # if v is a single element
 v0 = v
 if isSingleElementOfList: v0 = v[0]
 if type(v0) is bytearray or v0.__class__.__name__ == 'bytes':
 return hash(v0)
 if type(v0) is int:
 return hash(ToByteArray(v0))
 if type(v0) is str:
 return hash(v0.encode('utf-8')) 
 return bytes()
 else: # if v is a list
 res = bytearray()
 for vi in v:
 res += RecHash(vi) # recursion
 return hash(res) # hash the concatenated hashes

and the helper function:

def ToByteArray(x): 
 q, r = divmod(BitAbs(x),8)
 q += bool(r) 
 return ToByteArrayN(x, q)
def ToByteArrayN(x, n):
 B = bytearray()
 for i in range(0, int(n)):
 b = x % 256
 x = x // 256 # // = integer division => floor
 B.insert(0, b)
 return bytes(B)
def BitAbs(i):
 return i.bit_length()

It is working fine, however, it's very slow (and interestingly, it's not the hash() function that is slowing it down.).

Performance is worst when the list contains mostly numbers, so the ToByteArray function also doesn't seem to perform well.

asked Mar 4, 2017 at 15:46
\$\endgroup\$
2
  • \$\begingroup\$ Your function BitAbs is missing, so it is hard to suggest improvements of that part. \$\endgroup\$ Commented Mar 5, 2017 at 11:28
  • 1
    \$\begingroup\$ Thanks for all your answer. Using isinstance instead of type does indeed add a bit of performance. I meanwhile found out, that the recursion itself is a huge problem in python because of the large overhead of function calls. Doing it iteratively instead of recursive for lists in lists, would increase the performance by 100%. \$\endgroup\$ Commented Mar 5, 2017 at 15:32

2 Answers 2

2
\$\begingroup\$

As @pjz already noted in his answer, type(v0) is slow. However, I would recommend instead to use isinstance. This allows to use your class also with derived types. From help(isinstance):

isinstance(...)
 isinstance(object, class-or-type-or-tuple) -> bool
 Return whether an object is an instance of a class or of a subclass thereof.
 With a type as second argument, return whether that is the object's type.
 The form using a tuple, isinstance(x, (A, B, ...)), is a shortcut for
 isinstance(x, A) or isinstance(x, B) or ... (etc.).

Imagine if I rolled my own int class, Int, which is derived from the base int class:

class Int(int):
 pass

Your RecHash function would not work with this. If you use isinstance(v0, int), though, this would still be working.

if isinstance(v, list) and len(v) == 1:
 isSingleElementList = True
if not isinstance(v, list) or isSingleElementOfList: # if v is a single element
 v0 = v[0] if isSingleElementOfList else v
 type_v0 = type(v0)
 if isinstance(v0, bytearray) or v0.__class__.__name__ == 'bytes':
 return hash(v0)
 if isinstance(v0, int):
 return hash(ToByteArray(v0))
 if isinstance(v0, str):
 return hash(v0.encode('utf-8')) 
else: # if v is a list
 res = bytearray()
 for vi in v:
 res += RecHash(vi) # recursion
 return hash(res) # hash the concatenated hashes

Python has an official style-guide, PEP8. It recommends using lower_case for variable and function names, instead of PascalCase or camelCase.

answered Mar 5, 2017 at 11:29
\$\endgroup\$
2
\$\begingroup\$

ISTR that type(x) is relatively slow. Try saving it in a local variable:

typev = type(v)
if typev is list and len(v) == 1:
 isSingleElementList = True
if typev is not list or isSingleElementOfList: # if v is a single element
 v0 = v[0] if isSingleElementOfList else v
 type_v0 = type(v0)
 if type_v0 is bytearray or v0.__class__.__name__ == 'bytes':
 return hash(v0)
 if type_v0 is int:
 return hash(ToByteArray(v0))
 if type_v0 is str:
 return hash(v0.encode('utf-8')) 
else: # if v is a list
 res = bytearray()
 for vi in v:
 res += RecHash(vi) # recursion
 return hash(res) # hash the concatenated hashes

Also, btw, your code will choke on nested single element lists. Consider [[[1]]]

answered Mar 4, 2017 at 21:10
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.