I'm trying to implement a HashTable. In fact, I want a HashSet with Persons that have a name and a phoneNumber, both string. I understood the theory with collisions and everything but I don't know how to really implement something. This is what I've done so far:
#ifndef HASHSET_H_
#define HASHSET_H_
#include <iostream>
using namespace std;
#include <cstdio>
template <typename Element>
class HashSet{
private:
class Node{
private:
Element info;
Node* next;
public:
Node(){ //constructor
this->next=NULL;
}
Node(Node* next, Element info){ //parameters constructor
this->info=info;
this->next=next;
}
Node(const Node& node){ //copy constructor
this->info=node.info;
this->next=node.next;
}
~Node(){} //destructor
Element getInfo(){
return this->info;
}
Node* getNext(){
return this->next;
}
void setInfo(Element newE){
this->info=newE;
}
void setNext(Node* value){
this->next=value;
}
};
Node** head;
int hashSize;
int bucketSize; //holds the total elements of a cell
int totElems; //total number of elements in the hashset
public:
HashSet(){
}
bool IsEmptyAtGivenKey(int key);
bool HashSetIsFull();
int HashFunction(string e);
int HashSetSize();
};
template<typename Element>
HashSet<Element>::HashSet(){
this->hashSize=11;
this->head=new Node*[hashSize];
for(int i=0; i< hashSize; i++){
this->head[i]=NULL;
}
this->totElems=0;
}
template<typename Element>
bool HashSet<Element>::IsEmptyAtGivenKey(int key){
if(key>=0 and key < hashSize){
return head[key]==NULL;
}
return true;
}
template<typename Element>
int HashSet<Element>::HashFunction(string e){
int length = e.size();
int sum;
for(int i=0; i < length; i++){
sum+=e[i];
}
int hashCode = sum % hashSize;
return hashCode;
}
#endif /* HASHSET_H_ */
Is it any good? Any advice that will help me improve is welcome!
2 Answers 2
You're attempting to read values from uninitialized variables which has undefined behavior:
int sum; // uninitialized` for(int i=0; i < length; i++){` sum+=e[i]; // reading from uninitialized variable "sum"
From "HASHSET_H_" I gather it's a header file. You should never use
using namespace std;
in a header file (and chances are it's a bad idea to use it anywhere else, too -- unless you're porting legacy 1980s C code to C++):If you can use C++11 then
nullptr
is a better choice thanNULL
: read this//
(削除) Note, however, that if you're NOT using C++11, then (削除ここまで)the following is a potential performance problem:int HashSet<Element>::HashFunction(string e)
(削除) Pre-C++11 (削除ここまで)this can create unnecessary copy, so prefer to pass-by-const-reference(削除) in C++98/03 (削除ここまで)instead.Compare:
int
is a wrong type for a variable holding size or count (such ashashSize
,bucketSize
,totElems
). In particular, this is very bad:int length = e.size();
Not only you're using a wrong type, but potentially also losing performance due to unsigned-to-signed conversion // which, perhaps more importantly, is completely unnecessary and only makes the code harder to read -- are you suggesting thatlength
can be negative? If not, then don't useint
.Prefer
std::size_t
instead (defined in header<cstddef>
):Make sure to use meaningful variable names. This includes not using ambiguous abbreviations.
totElems
is not a good name.totalElementsCount
is better.Additional idea: make sure to pick a naming convention and stick to it. For instance, some of your functions start with uppercase character, like
IsEmptyAtGivenKey
, but some with a lowercase character, likesetInfo
. It's better if they all follow the same convention (e.g., all functions start with lowercase, all types start with uppercase, etc.).As a hint, read the "Naming" section from Google C++ Style Guide.
// Extra hint: feel free to completely ignore the other sections, or even take them as examples of what NOT to do (like the one on exceptions), unless you're working at Google with Google's code base.
Manual memory management is troubling.
I've read from your comments that this is for school, so chances are the class / teaching methods simply aren't exactly up-to-date (thinking teaching C before C++ is a good idea is often a good indicator), but IRL invoking raw
new
(such as here:new Node*[hashSize];
) without a very good reason is nowadays unnecessary and considered a bad practice.If you want to learn best practices, start with this.
// Not to repeat myself from my previous answer, here's a link to it, if you'd like more information on memory management specifically: here.
-
1\$\begingroup\$
int HashSet<Element>::HashFunction(string e)
will create a copy in C++11. Move semantics require an rvalue reference (string&& e
). \$\endgroup\$Yuushi– Yuushi2013年05月07日 01:44:16 +00:00Commented May 7, 2013 at 1:44 -
\$\begingroup\$ @Yuushi Interesting, I've tested with a minimal example: * seems to invoke move constructor when
std::move
is applied: coliru.stacked-crooked.com/… ; * but indeed invokes copy constructor otherwise: coliru.stacked-crooked.com/… ; What's the sensible default to assume? \$\endgroup\$Matt– Matt2013年05月07日 18:01:14 +00:00Commented May 7, 2013 at 18:01 -
\$\begingroup\$ EDIT: done, thanks for the catch! // Given that const-ref optimizes better, I'm inclined toward editing my answer to recommend it here: tinyurl.com/cu5kzhp \$\endgroup\$Matt– Matt2013年05月07日 18:09:45 +00:00Commented May 7, 2013 at 18:09
A few more things:
Avoid "obvious" comments: things like "constructor" and "copy constructor" are completely superfluous to anyone that knows C++. Remember, good comments should be there to guide the reader through why something works the way it does, or perhaps for a particularly tricky bit of code.
Prefer using member initialization to initialization in the constructor body. For example:
Node(){
this->next=NULL;
}
should instead be:
Node()
: next(NULL)
{ }
This works similarly for the copy constructor. Member initialization can be more efficient, and is the only way to initialize some things (reference or const
members, for example).
There's not a huge amount of point to having next
and info
being private
in Node
when the class itself is already private
. Encapsulation is great for user-facing interfaces, but for simple things like Node
, it goes too far. You're now required to write 4 boilerplate set/get
functions - to me, a net negative (although I suppose beware this advice at a university that won't let you use things as basic as std::vector
- they may simply have a mandate that "all member variables must be protected/private
" - no matter if it is actually a good idea or not).
Generally, there is little point separating out template definitions from declarations. (Almost) all template classes are simply defined inline in their header file - that is, the definition accompanies the declaration. You also seem to be missing the implementations for two functions: int HashSetSize()
and bool HashSetIsFull()
.
Some of your functions don't make much sense to me - IsEmptyAtGivenKey(int key)
seems like an implementation detail - the user of a hash table shouldn't have to know or care if a given bucket is empty or not. In fact, they likely shouldn't know or care how many buckets there are in the first place - only the load factor of the table is really important.
HashFunction(string e)
(which should pass the string by const-reference - const string& e
) - is supposed to do...what? Hash the given string
? But I thought the HashTable
contained elements of type Element
? The name is also very confusing - is this giving the table a new hash function? As far as I can tell, this should probably be modified and renamed:
template<typename Element>
int HashSet<Element>::Hash(const Element& e)
and should probably be private
- the hashing itself is again an implementation detail.
You are missing destructors, copy constructors and copy assignment operators in your HashSet implementation. Without a destructor it leaks memory, without the other 2, any use of operator=
or the copy construction will do the wrong thing. At the very least, you should have a destructor for your HashSet
:
~HashSet()
{
//delete everything head points to
//Have temporary node point to head
//head = head->next
//delete temporary node
}
Likewise, you should either implement HashSet(const HashSet& rhs)
and HashSet& operator=(const HashSet& rhs)
or declare them both private
so that they cannot be incorrectly used.
std::vector
here. \$\endgroup\$