I am learning how to implement a basic undirected graph and perform a depth first traversal.
Please critique my code for correctness, best practices, security, and any other comments that may be helpful.
Graph.h:
#include <vector>
/************************************************************************
/****************** PUBLIC METHODS ************************
/************************************************************************
/**** addVertex - new vertex with no relationships ****
/**** addNeighbor - builds an edge between vertices ****
/**** DFT - performs a depth first traversal, ****
printing each vertex name when visisted ****
/************************************************************************
/************************************************************************/
struct Vertex {
std::string Name;
std::vector<Vertex*> neighbors; //vector of pointers to Vertices
bool visited;
Vertex::Vertex() {visited = false;}
};
class Graph {
public:
Graph();
void addVertex(std::string name);
void addNeighbor(std::string vertex, std::string neighbor);
void DFT(std::string start);
private:
int traversalCount;
std::vector<Vertex> graph;
int getHashVal(std::string name);
};
Graph.cpp
#include "Graph.h"
#include <iostream>
#include <string>
Graph::Graph() {
graph.resize(53);
traversalCount = 0;
}
//adds a vertex to the graph with no edges
void Graph::addVertex(std::string name) {
int HashVal = getHashVal(name);
if (graph[HashVal].Name.compare(name) != 0)
{
graph[HashVal].Name = name;
std::cout << name << " created at position " << HashVal << std::endl;
}
else
{
return;
}
}
//makes a relationship(edge) between two vertices, creating a vertex if it does not already exist
void Graph::addNeighbor(std::string vertex, std::string neighbor) {
int VertexHashVal = getHashVal(vertex);
int NeighborHashVal = getHashVal(neighbor);
if (graph[VertexHashVal].Name.compare(vertex) != 0)
{
graph[VertexHashVal].Name = vertex; //throw exception?
std::cout << "Collision";
}
else if (graph[NeighborHashVal].Name.compare(neighbor) !=0)
{
this->addVertex(neighbor);
this->addNeighbor(vertex, neighbor);
}
else
{
graph[VertexHashVal].neighbors.push_back(&graph[NeighborHashVal]);
graph[NeighborHashVal].neighbors.push_back(&graph[VertexHashVal]);
}
}
//my depth first traversal method - prints a vertex when visited, recursively traverses across each row
void Graph::DFT(std::string start) {
int HashVal = getHashVal(start);
if (graph[HashVal].visited == false)
{
std::cout << traversalCount << ": " << graph[HashVal].Name << std::endl;
graph[HashVal].visited = true;
traversalCount++;
}
for (int i = 0; i < graph[HashVal].neighbors.size(); i++)
{
if (graph[HashVal].neighbors[i]->visited == false)
{
std::cout << traversalCount << ": " << graph[HashVal].neighbors[i]->Name << std::endl;
graph[HashVal].neighbors[i]->visited = true;
traversalCount++;
DFT(graph[HashVal].neighbors[i]->Name);
}
}
}
//used for choosing position within the vector
int Graph::getHashVal(std::string name) {
int HashVal = 0;
for (int i = 0; i < name.size(); i++)
{
HashVal += name[i];
}
HashVal %= graph.size();
return HashVal;
}
main.cpp
#include "Graph.h"
#include <iostream>
int main() {
Graph AL; //adjacency list
AL.addVertex("Canyon Lake");
AL.addNeighbor("Canyon Lake", "Sun City");
AL.addNeighbor("Canyon Lake", "Lake Elsinore");
AL.addNeighbor("Sun City", "Menifee");
AL.addNeighbor("Lake Elsinore", "Murrieta");
AL.addNeighbor("Murrieta", "Temecula");
AL.addNeighbor("Murrieta", "Menifee");
AL.addNeighbor("Menifee", "Temecula");
std::cout << std::endl;
AL.DFT("Canyon Lake"); //depth first traversal
std::cin.get();
return 0;
}
3 Answers 3
Couple of major issues.
- You are writing your own hash function.
In addition some big problems.
- You are implementing your own hash based dictionary (badly).
- You need to use the visitor pattern (rather than have a mark in each node).
The hash function:
int Graph::getHashVal(std::string name) {
int HashVal = 0;
for (int i = 0; i < name.size(); i++)
{
HashVal += name[i];
}
HashVal %= graph.size();
return HashVal;
}
That is a relatively trivial hash function that is known to have a bad distribution. If your keys are the same length you are very likely to get clashes.
TAP
PAT
For any combination of 3 letter words the distance between any hashes can not be greater than 78. Without doing any analysis I bet the distribution of 3 letter words is actually normal so a whole bunch of words have a hash within 12 places of 234.
Writing hash functions is hard to do well so don't. Pick a well known one and use that.
Building your own dictionary
You use your hash function to build a dictionary like object using a vector. A couple of problems with this. But the main one is that clashes are not correctly resolved. Each entry in hash should be a list (so you can handle clashes correctly). Also 53 is rather a small size (though it is prime which is nice).
But there is already a dictionary like object in the standard library. std::map
or std::unordered_map
. The first one builds a structure with the same access characteristics as a balanced binary tree. The second one has access characteristics like a hash table. You can choose either and it will work fine.
In your node object you store pointers to the other nodes.
std::vector<Vertex*> neighbors
Personally I would store the iterator (but that's just a small personal preference).
Visitor Pattern
You are recording visitation to the nodes in the nodes themselves. This is a simple solution but requires you to reset the whole graph before each traversal. PS you don't reset the visited attribute so you can only call DFT()
once then it becomes invalid.
There is a better technique called a visitor pattern. This basically delegates traversal of the graph to the nodes. But Uses the visitor object as a callback to do processing. Thus you can control order from within your code.
Note: I would not think it as normal to describe a traversal as "Depth First" on cycle graphs. The graph may not be fully connected and there is no start point indicating a top.
Visitor pattern:
struct Visitor
{
virtual ~Visitor() {}
virtual bool visit(Vertex& node) = 0; // returns false if we have already
// seen the node, true otherwise
};
struct Vertex {
std::string Name;
std::vector<Vertex*> neighbors; //vector of pointers to Vertices
void accept(Visitor& visitor)
{
if (visitor.visit(*this)) // If this is the first time
{ // The do all the linked nodes
for(auto loop: neighbors)
{ loop->accept(visitor);
}
}
}
};
Now we can write an application specific visitor.
struct DFTVisitor: public Visitor
{
std::set<Vertex*> alreadyVisited;
int traversalCount;
DFTVisitor(): traversalCount(0) {}
virtual bool visit(Vertex& node)
{
// Check if we have been to this node.
if (alreadyVisited.find(&node) != alreadyVisited.end())
{ return false;
}
alreadyVisited.insert(&node);
std::cout << traversalCount << ": " << node.Name << std::endl;
traversalCount++;
return true;
}
};
Now the graph DFT becomes:
void Graph::DFT(std::string start) {
Vertix* s = getFirstVertex(start);
if (s != NULL)
{
DFTVisitor visitor;
s->accept(visitor);
}
}
-
\$\begingroup\$ I did write a quick reset method to clear the bool
visited
, but I very much like the visitor pattern shown here. As for the hashing function, I admit it is very poor. I wrote a much better (probably still not great) hashing function and dictionary for a previous assignment, but have yet to implement it into this one. Unfortunately, use of the STL is limited for these assignments as we're building our own implementations of many of the containers/functions. \$\endgroup\$randomusername– randomusername2014年05月13日 14:29:21 +00:00Commented May 13, 2014 at 14:29 -
\$\begingroup\$ No point writing a hash table until you have a list. Use a vector and search for nodes in the vector when looking them up. May be slower (but not significantly given the input data size) but at least it will work correctly. You can even sort the vector after each insertion (this can make finding elements quicker
O(ln(N))
. \$\endgroup\$Loki Astari– Loki Astari2014年05月13日 14:40:27 +00:00Commented May 13, 2014 at 14:40 -
\$\begingroup\$ As for hashing: Some Simple Hashes; a well known one for strings is
Bernstein hash
its not known why it works so well but it seems to give reasonable distributions for small keys (not perfect but very simple to implement). \$\endgroup\$Loki Astari– Loki Astari2014年05月13日 14:50:50 +00:00Commented May 13, 2014 at 14:50 -
\$\begingroup\$ I've written a doubly linked list and hash table (combining ideas from Bernstein and Knuth) that I'll be implementing here - my main concern was the logic of the graph itself. \$\endgroup\$randomusername– randomusername2014年05月13日 14:53:03 +00:00Commented May 13, 2014 at 14:53
Few things need explanation and correction.
what is 53 in
Graph::Graph
()?getHashVal
return value depends on the graph size, which may change as new vertices are added, so the same string may return different hash depending on the graph state. Is it intentional?Is there a way to clear
visited
attribute? If not, it is not possible to make another traversal.Why
Graph::addNeighbor()
treatsvertex
andneighbor
asymmetrically?traversalCount
is only used inDFT
, so it is better be a local variable than of a class member.
Update: tloveless commented:
53 is a prime number, I've read primes are better for hashing functions
I'd agree for a correct hash function. Sum of character values modulo something is equally bad regardless the primality of the modulo. Besides, for more than 53 entries its results are plain wrong.
getHashVal is intentionally using the graph size - I've thought about using linked lists to deal with collisions when returning the same hash value, but I'm not sure if that works with graph logic
The important moment is that the same string will generate different hashes depending on the graph size (see above). And in any case collisions must be resolved anyway - there's no perfect hash for arbitrary strings.
addNeighbor() treats vertex and neighbor asymmetrically as this is supposed to be an undirected graph,
Sorry I am not sure you understand. As coded addNeighbor
considers
(graph[VertexHashVal].Name.compare(vertex) != 0)
as a collision, while
(graph[NeighborHashVal].Name.compare(neighbor) !=0)
as not. Why?
I'd guess you coded it this way to cope with a recursive call to addNeighbor
. There is no reason for it to be recursive at all.
traversalCount would be reset as a local variable due to the recursive call to DFT. How would I make it local without resetting it with each recursive call?
Pass it down as a parameter, I suppose.
-
\$\begingroup\$ 53 is a prime number, I've read primes are better for hashing functions -
getHashVal
is intentionally using the graph size - I've thought about using linked lists to deal with collisions when returning the same hash value, but I'm not sure if that works with graph logic,addNeighbor()
treats vertex and neighbor asymmetrically as this is supposed to be an undirected graph, andtraversalCount
would be reset as a local variable due to the recursive call to DFT. How would I make it local without resetting it with each recursive call? \$\endgroup\$randomusername– randomusername2014年05月13日 03:55:43 +00:00Commented May 13, 2014 at 3:55 -
\$\begingroup\$ There's not enough room to follow up here; I'll take a liberty to update the answer with your comment inline. \$\endgroup\$vnp– vnp2014年05月13日 06:12:26 +00:00Commented May 13, 2014 at 6:12
This looks pretty clean. I have several things to address:
No need to assign a member in the constructor body:
Vertex::Vertex() {visited = false;}
For this, prefer an initializer list:
Vertex::Vertex() : visited(false) {}
Each member within the list is initialized in the form
member(value)
and is separated by commas. Another advantage of initializer lists is thatconst
members can be initialized.Each of your functions that take an
std::string
do not modify this string. When you pass in an object with no intention of modifying it, you should pass byconst&
. This avoids an unnecessary copy and prevents the parameter from being modified accidentally.void func(std::string const& str) {}
This:
if (graph[HashVal].visited == false)
can be simplified to this:
if (!graph[HashVal].visited)
Prefer to use
"\n"
overstd::endl
for newlines. The latter flushes the buffer, which is slower and is usually unnecessary. More info about this can be found here.std::cout << "This outputs three newlines\n\n\n";
You don't need a
return 0
at the end ofmain()
. Reaching this point already implies a successful termination, and the compiler will do the return for you.
-
\$\begingroup\$ Would you recommend the newline
\n
used as:std::cout << Count << ": " Name << "\n";
? \$\endgroup\$randomusername– randomusername2014年05月13日 03:05:13 +00:00Commented May 13, 2014 at 3:05 -
\$\begingroup\$ @tloveless: Yes, that also works. \$\endgroup\$Jamal– Jamal2014年05月13日 03:13:05 +00:00Commented May 13, 2014 at 3:13
-
\$\begingroup\$ Jamal - are there any performance gains with the boolean simplification, or just preference on style/readability? \$\endgroup\$randomusername– randomusername2014年05月13日 05:04:00 +00:00Commented May 13, 2014 at 5:04
-
\$\begingroup\$ @tloveless: It's just preference. It could also be safer in case you accidentally use
=
instead of==
. \$\endgroup\$Jamal– Jamal2014年05月13日 05:06:37 +00:00Commented May 13, 2014 at 5:06
Explore related questions
See similar questions with these tags.