Basic impl of hash_table in python. Supports set, get, and delete. Uses open addressing with linear probing function. Any ideas would be greatly appreciated! Thanks!
class Entry:
"""
holds key, value pairs
"""
def __init__(self, key, value):
self.key = key
self.value = value
self.is_deleted = False
class Map(object):
"""
a basic, minimal implementation of a hash map
"""
def __init__(self):
"""
constructs a new Map
"""
self.table = [None] * 10
self.load_factor = .75
self.current_size = 0
def __setitem__(self, key, item):
"""
stores the key value combo in the table
implements open addressing collision resolution
"""
entry = Entry(key, item)
for i in range(len(self.table)):
index = self.__get_hash_code(key, i)
if self.table[index] is None or self.table[index].is_deleted:
self.table[index] = entry
self.current_size += 1
if float(self.current_size) / len(self.table) >= self.load_factor:
self.__resize_table()
break
def __getitem__(self, key):
"""
gets the value associated with the key
"""
for i in range(len(self.table)):
index = self.__get_hash_code(key, i)
if self.table[index] is not None:
if self.table[index].key == key:
if self.table[index].is_deleted:
raise KeyError('Key is not in the map')
else:
return self.table[index].value
elif self.table[index] is None:
raise KeyError('Key is not in the map')
raise KeyError('Hmm something has gone wrong here')
def __get_hash_code(self, key, value):
return (hash(key) + value) % len(self.table)
def __resize_table(self):
new_table = [None] * (len(self.table) * 2)
for i in range(len(self.table)):
new_table[i] = self.table[i]
self.table = new_table
def delete(self, key):
"""
deletes a value from the hash table
"""
for i in range(len(self.table)):
index = self.__get_hash_code(key, i)
if self.table[index] is not None:
if self.table[index].key == key:
if self.table[index].is_deleted:
raise KeyError('Key is not in the map')
else:
self.table[index].is_deleted = True
self.current_size -= 1
break
1 Answer 1
Your code looks generally quite good. I just have a few suggestions:
- Don't forget to define
__str__()
and__repr__()
for eachclass
- I dislike the use of the "magic number" constants in the
__init__()
of theHash_Map
. They really ought to be either passed as parameters, read from a config somewhere, or clearly marked in a parameter section as pseudo-constants. - The line
for i in range(len(self.table)):
appears multiple times. This style of loop is considered non-Pythonic. Python has theenumerate()
built-in that returns a succession of tuples consisting of both the item and its index. I'd preferfor (i,item) in enumerate(self.table):
- If there's a clash,
hash(key)
will unnecessarily be called multiple times. Note thatindex = self.__get_hash_code(key, i)
is called each time through thei
loop, but the first thing it does is callhash(key)
even though it may have already been computed. Better to computehash(key)
once outside the loop since it won't change. Multiple references to
self.table[index]
. The interpreter might be able to optimize them away, but the code will be shorter (and less error-prone) if it performs the lookup once and stores in a temp. (Or just useenumerate()
like my previous suggestion)In
def __get_hash_code(self, key, value):
the third parameter is poorly named. It's not the value of the element, it's the linear probing parameter.- There's a bug related to the interaction of
__get_hash_code()
and__resize_table()
which can lead to \$O(n)\$ performance, defeating the purpose of using a hashtable.
Explore related questions
See similar questions with these tags.
dict
s are implemented as hash tables, right (stackoverflow.com/questions/114830/…)? If that is the case and this is just for fun (or educational value), we have a tag for that: reinventing-the-wheel. \$\endgroup\$