2
\$\begingroup\$

I wrote a minimal implementation of a hashtable in python as an exercise.

from copy import copy
def init_nested_array(size):
 arr = []
 for i in range(size):
 arr.append([])
 return arr
class hashtable:
 def __init__(self):
 self.mod = 3
 self.num_items = 0
 self.arr = init_nested_array(self.mod)
 def __str__(self):
 return str(self.arr)
 def __repr__(self):
 return str(self)
 def __len__(self):
 return self.num_items
 def __getitem__(self, key):
 return self.get_item(key)
 def __setitem__(self, key, value):
 return self.set_item(key, value)
 def hash_function(self, key):
 return hash(key) % self.mod
 def get_item(self, key):
 bucket = self.arr[self.hash_function(key)]
 for pair in bucket:
 if pair[0] == key:
 return pair[1]
 raise KeyError("Item not found", key)
 def set_item(self, key, value):
 bucket = self.arr[self.hash_function(key)]
 for pair in bucket:
 if pair[0] == key:
 pair[1] = value
 return
 self.arr[self.hash_function(key)].append([key, value])
 self.num_items += 1
 if self.num_items >= .7*self.mod:
 self.expand()
 def expand(self):
 # Ideally we'd expand to new prime number. But at least we'll stick to odds
 self.mod = self.mod*2 + 1
 old_arr = copy(self.arr)
 num_items = self.num_items
 self.arr = init_nested_array(self.mod)
 for bucket in old_arr:
 for pair in bucket:
 self.set_item(pair[0], pair[1])
 self.num_items = num_items

The things I've noticed for potential improvement in my code:

  • it isn't immediately obvious why init_nested_array is necessary and when it should be used. A comment for that method would probably be appropriate
  • I used the rule of thumb that hash tables shouldn't exceed a load of 70% but this is buried as a magic number in set_item. This should probably either be a private class variable with a comment as to why I picked that number or a global constant. It seems that if I want to allow the user to adjust this I should make it a class member and otherwise make it a global constant. Do you agree?
  • I just learned that repr is meant to allow eval to be called on it. Since I currently don't allow the user to initialize the dictionary with a list of elements it doesn't seem possible for eval to recreate the original instance. I've been using repr as a way to interact a pretty print for the interactive shell. It seems like that's incorrect. Should I just leave repr unimplemented then?
  • hash_function should probably have a leading underscore since it seems best as a private method. Should I also change self.mod and self.num_items and self.arr to have leading underscores? Is it really necessary to have leading underscores for class variables?

Please assume that anything I didn't mention is because I don't yet know the relevant concept. Looking forward to any criticisms or observations.

200_success
145k22 gold badges190 silver badges478 bronze badges
asked Jan 29, 2016 at 1:37
\$\endgroup\$
1
  • \$\begingroup\$ Similar question (just for your information) \$\endgroup\$ Commented Jan 29, 2016 at 11:18

1 Answer 1

5
\$\begingroup\$
  • init_nested_array in idiomatic python is

    def init_nested_array(size):
     return [ [] for _ in range(size) ]
    

    Since you are concerned with its reason to exist (a desire to put a defensive comment usually indicates a problem of that sort), you may want to use it inline:

     self.arr = [ [] for _ in range(self.mod) ]
    

    It would be even more clear if you call it self.buckets instead of (pretty meaningless) self.arr.

  • Initial mod and the .7 factor should be default (or named) constructor arguments:

    def __init__(self, mod = 3, threshold = 0.7):
     self.mod = mod
     self.threshold = threshold
     ....
    
  • Mutual recursion of set_item and expand looks scary. I understand that it terminates, but the reviewer shudders for a moment. I would consider having a non-recursive set_item_safe() helper to be used along the lines of:

    def set_item():
     if need_to_expand:
     expand()
     set_item_safe()
    

    and

    def expand():
     ....
     rehash()
     set_item_safe()
    
answered Jan 29, 2016 at 2:06
\$\endgroup\$
2
  • \$\begingroup\$ I really like your suggestions! Is there any merit to the idea that the user shouldn't be able to pick the load/mod because they'll pick a bad value? I assume you think we're responsible adults and a comment advising the user how to pick the values is enough. The set_item_safe idea is interesting but I'm having trouble thinking of a descriptive name for set_item_safe. Do you think its good enough name as is? \$\endgroup\$ Commented Feb 3, 2016 at 5:17
  • \$\begingroup\$ @emschorsch Re good/bad values. Unless a theorem is proven that your defaults work best in all cases, let the client to decide. Put some guidelines in the docstring. Re set_item_safe. I don't know really, and I almost of have no objections, as long as it is not exposed to the client. place_item perhaps. \$\endgroup\$ Commented Feb 3, 2016 at 5:27

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.