I wrote a minimal implementation of a hashtable in python as an exercise.
from copy import copy
def init_nested_array(size):
arr = []
for i in range(size):
arr.append([])
return arr
class hashtable:
def __init__(self):
self.mod = 3
self.num_items = 0
self.arr = init_nested_array(self.mod)
def __str__(self):
return str(self.arr)
def __repr__(self):
return str(self)
def __len__(self):
return self.num_items
def __getitem__(self, key):
return self.get_item(key)
def __setitem__(self, key, value):
return self.set_item(key, value)
def hash_function(self, key):
return hash(key) % self.mod
def get_item(self, key):
bucket = self.arr[self.hash_function(key)]
for pair in bucket:
if pair[0] == key:
return pair[1]
raise KeyError("Item not found", key)
def set_item(self, key, value):
bucket = self.arr[self.hash_function(key)]
for pair in bucket:
if pair[0] == key:
pair[1] = value
return
self.arr[self.hash_function(key)].append([key, value])
self.num_items += 1
if self.num_items >= .7*self.mod:
self.expand()
def expand(self):
# Ideally we'd expand to new prime number. But at least we'll stick to odds
self.mod = self.mod*2 + 1
old_arr = copy(self.arr)
num_items = self.num_items
self.arr = init_nested_array(self.mod)
for bucket in old_arr:
for pair in bucket:
self.set_item(pair[0], pair[1])
self.num_items = num_items
The things I've noticed for potential improvement in my code:
- it isn't immediately obvious why init_nested_array is necessary and when it should be used. A comment for that method would probably be appropriate
- I used the rule of thumb that hash tables shouldn't exceed a load of 70% but this is buried as a magic number in set_item. This should probably either be a private class variable with a comment as to why I picked that number or a global constant. It seems that if I want to allow the user to adjust this I should make it a class member and otherwise make it a global constant. Do you agree?
- I just learned that repr is meant to allow eval to be called on it. Since I currently don't allow the user to initialize the dictionary with a list of elements it doesn't seem possible for eval to recreate the original instance. I've been using repr as a way to interact a pretty print for the interactive shell. It seems like that's incorrect. Should I just leave repr unimplemented then?
hash_function
should probably have a leading underscore since it seems best as a private method. Should I also changeself.mod
andself.num_items
andself.arr
to have leading underscores? Is it really necessary to have leading underscores for class variables?
Please assume that anything I didn't mention is because I don't yet know the relevant concept. Looking forward to any criticisms or observations.
-
\$\begingroup\$ Similar question (just for your information) \$\endgroup\$oliverpool– oliverpool2016年01月29日 11:18:09 +00:00Commented Jan 29, 2016 at 11:18
1 Answer 1
init_nested_array
in idiomatic python isdef init_nested_array(size): return [ [] for _ in range(size) ]
Since you are concerned with its reason to exist (a desire to put a defensive comment usually indicates a problem of that sort), you may want to use it inline:
self.arr = [ [] for _ in range(self.mod) ]
It would be even more clear if you call it
self.buckets
instead of (pretty meaningless)self.arr
.Initial
mod
and the.7
factor should be default (or named) constructor arguments:def __init__(self, mod = 3, threshold = 0.7): self.mod = mod self.threshold = threshold ....
Mutual recursion of
set_item
andexpand
looks scary. I understand that it terminates, but the reviewer shudders for a moment. I would consider having a non-recursiveset_item_safe()
helper to be used along the lines of:def set_item(): if need_to_expand: expand() set_item_safe()
and
def expand(): .... rehash() set_item_safe()
-
\$\begingroup\$ I really like your suggestions! Is there any merit to the idea that the user shouldn't be able to pick the load/mod because they'll pick a bad value? I assume you think we're responsible adults and a comment advising the user how to pick the values is enough. The set_item_safe idea is interesting but I'm having trouble thinking of a descriptive name for set_item_safe. Do you think its good enough name as is? \$\endgroup\$emschorsch– emschorsch2016年02月03日 05:17:42 +00:00Commented Feb 3, 2016 at 5:17
-
\$\begingroup\$ @emschorsch Re good/bad values. Unless a theorem is proven that your defaults work best in all cases, let the client to decide. Put some guidelines in the docstring. Re
set_item_safe
. I don't know really, and I almost of have no objections, as long as it is not exposed to the client.place_item
perhaps. \$\endgroup\$vnp– vnp2016年02月03日 05:27:19 +00:00Commented Feb 3, 2016 at 5:27