\$\begingroup\$
\$\endgroup\$
8
I have implemented a very basic hash data structure with basic set/retrieve methods. A good example of use of this implementation would be a phone-book. I have used as underneath storage structure an array consisting of linked lists.
Please a code review within these requirements.
public class MyHash
{
private LinkedList<string>[] storage;
public MyHash()
{
storage = new LinkedList<string>[26]; //english alphabet
}
private int HashFunction(string key)
{
//maps first character of name to a 0:25 index
return key.ToLower().ToCharArray()[0] - 97;
}
public string this[string key]
{
get {
var list = storage[HashFunction(key)];
if (list != null)
{
foreach (var item in list)
{
if (item.Split(':')[0] == key)
return item.Split(':')[1];
}
}
return "Not found!";
}
set {
if (storage[HashFunction(key)] == null)
storage[HashFunction(key)] = new LinkedList<string>();
storage[HashFunction(key)].AddLast(key + ":" + value);
}
}
}
Joan DimkoJoan Dimko
asked Feb 23, 2018 at 10:51
1 Answer 1
\$\begingroup\$
\$\endgroup\$
There are several (major) problems with this class:
- Keys that contain a
:
character cannot be used to look up their value, and values that contain a:
character are only partially returned. Also, this class cannot handlenull
values (it returns an empty string instead). Don't mash keys and values together into a single value, using a key-value object instead (such asKeyValuePair<TKey, TValue>
). - Keys that start with anything but
[a-zA-Z]
cause anIndexOutOfRangeException
. This is not documented anywhere, and there's no obvious reason why such keys shouldn't be allowed (phone books aren't limited to countries with a Latin alphabet, after all). Why not use the modulus of the hash value and the number of buckets? - Non-existing keys produce a 'magic' value, which isn't documented anywhere. Moreover, a caller won't be able to distinguish between a non-existing key and an actual (valid)
"Not found!"
value. You may want to throw an exception instead. Alternately, you could provide aTryGetValue
method instead. - Overwriting the value for an existing key is not possible. Your class will store both values (why?), but lookups will continue to return the first value. I would expect the new entry to replace the earlier one. If that's not the intended behavior, then I'd expect that to be documented, and an appropriate exception to be thrown.
- Your hash function is poorly chosen. Ideally, you'll want to spread entries across buckets as evenly as possible, so each lookup will take roughly the same amount of time on average. If items are concentrated in only a few buckets (some characters are more frequently used than others) then lookups for such keys will be slower. Why not use
key.GetHashCode()
instead, or let the caller provide a hashing method in the constructor? - Using a fixed number of buckets means that lookup times are essentially \$O(n)\,ドル not \$O(1)\$. Locating a bucket takes constant time, while searching through multiple entries in a bucket takes linear time, so you'll want to limit the number of entries per bucket (ideally to 1). That requires some kind of resizing strategy.
Other minor issues:
- By calling
Split(':')
andHashFunction
multiple times you're doing the same work repeatedly. Just call them once and store the result in a variable. - Some names could be clarified a bit:
MyHashTable
orMyHashMap
instead ofMyHash
, andbuckets
instead ofstorage
. - Going for a generic implementation would not only be of more practical use, it would also have helped you to avoid some of the above problems: you wouldn't have been able to mash keys and values together or to use a sub-optimal leading-character 'hash'.
answered Feb 23, 2018 at 21:15
lang-cs
key + ":" + value
if you split it later and get the second item. What if a value contains a:
too? Why don't you simply addvalue
? Very weird. \$\endgroup\$:
characters. Another problem is that your class fails to handle keys that don't start witha-z
: keys like"1"
or"中"
cause anIndexOutOfRangeException
. Yet another problem is that"Not found!"
could actually be a valid input value, and besides that it's an undocumented 'magic' value. Besides that, you may want to be more specific in what kind of feedback you're looking for - is this for educational purposes, or for actual use? \$\endgroup\$