Depth First Traversal of a Graph

Question 1

I am learning how to implement a basic undirected graph and perform a depth first traversal.

Please critique my code for correctness, best practices, security, and any other comments that may be helpful.

Graph.h:

#include <vector>
/************************************************************************
/****************** PUBLIC METHODS ************************
/************************************************************************
/**** addVertex - new vertex with no relationships ****
/**** addNeighbor - builds an edge between vertices ****
/**** DFT - performs a depth first traversal, ****
 printing each vertex name when visisted ****
/************************************************************************
/************************************************************************/
struct Vertex {
 std::string Name;
 std::vector<Vertex*> neighbors; //vector of pointers to Vertices
 bool visited;
 Vertex::Vertex() {visited = false;}
};
class Graph {
public:
 Graph();
 void addVertex(std::string name);
 void addNeighbor(std::string vertex, std::string neighbor);
 void DFT(std::string start);
private:
 int traversalCount;
 std::vector<Vertex> graph;
 int getHashVal(std::string name);
};

Graph.cpp

#include "Graph.h"
#include <iostream>
#include <string>
Graph::Graph() {
 graph.resize(53);
 traversalCount = 0;
}
//adds a vertex to the graph with no edges
void Graph::addVertex(std::string name) {
 int HashVal = getHashVal(name);
 if (graph[HashVal].Name.compare(name) != 0)
 {
 graph[HashVal].Name = name;
 std::cout << name << " created at position " << HashVal << std::endl;
 }
 else
 {
 return;
 }
}
//makes a relationship(edge) between two vertices, creating a vertex if it does not already exist
void Graph::addNeighbor(std::string vertex, std::string neighbor) {
 int VertexHashVal = getHashVal(vertex);
 int NeighborHashVal = getHashVal(neighbor);
 if (graph[VertexHashVal].Name.compare(vertex) != 0)
 {
 graph[VertexHashVal].Name = vertex; //throw exception?
 std::cout << "Collision";
 }
 else if (graph[NeighborHashVal].Name.compare(neighbor) !=0)
 {
 this->addVertex(neighbor);
 this->addNeighbor(vertex, neighbor);
 }
 else
 {
 graph[VertexHashVal].neighbors.push_back(&graph[NeighborHashVal]);
 graph[NeighborHashVal].neighbors.push_back(&graph[VertexHashVal]);
 }
}
//my depth first traversal method - prints a vertex when visited, recursively traverses across each row
void Graph::DFT(std::string start) {
 int HashVal = getHashVal(start);
 if (graph[HashVal].visited == false)
 {
 std::cout << traversalCount << ": " << graph[HashVal].Name << std::endl;
 graph[HashVal].visited = true;
 traversalCount++;
 }
 for (int i = 0; i < graph[HashVal].neighbors.size(); i++)
 {
 if (graph[HashVal].neighbors[i]->visited == false)
 {
 std::cout << traversalCount << ": " << graph[HashVal].neighbors[i]->Name << std::endl;
 graph[HashVal].neighbors[i]->visited = true;
 traversalCount++;
 DFT(graph[HashVal].neighbors[i]->Name);
 }
 }
}
//used for choosing position within the vector
int Graph::getHashVal(std::string name) {
 int HashVal = 0;
 for (int i = 0; i < name.size(); i++)
 {
 HashVal += name[i];
 }
 HashVal %= graph.size();
 return HashVal;
}

main.cpp

#include "Graph.h"
#include <iostream>
int main() {
 Graph AL; //adjacency list
 AL.addVertex("Canyon Lake");
 AL.addNeighbor("Canyon Lake", "Sun City");
 AL.addNeighbor("Canyon Lake", "Lake Elsinore"); 
 AL.addNeighbor("Sun City", "Menifee"); 
 AL.addNeighbor("Lake Elsinore", "Murrieta"); 
 AL.addNeighbor("Murrieta", "Temecula");
 AL.addNeighbor("Murrieta", "Menifee");
 AL.addNeighbor("Menifee", "Temecula");
 std::cout << std::endl;
 AL.DFT("Canyon Lake"); //depth first traversal
 std::cin.get();
 return 0;
}

Question 2

Couple of major issues.

You are writing your own hash function.

In addition some big problems.

You are implementing your own hash based dictionary (badly).
You need to use the visitor pattern (rather than have a mark in each node).

The hash function:

int Graph::getHashVal(std::string name) {
 int HashVal = 0;
 for (int i = 0; i < name.size(); i++)
 {
 HashVal += name[i];
 }
 HashVal %= graph.size();
 return HashVal;
}

That is a relatively trivial hash function that is known to have a bad distribution. If your keys are the same length you are very likely to get clashes.

 TAP
 PAT

For any combination of 3 letter words the distance between any hashes can not be greater than 78. Without doing any analysis I bet the distribution of 3 letter words is actually normal so a whole bunch of words have a hash within 12 places of 234.

Writing hash functions is hard to do well so don't. Pick a well known one and use that.

Building your own dictionary

You use your hash function to build a dictionary like object using a vector. A couple of problems with this. But the main one is that clashes are not correctly resolved. Each entry in hash should be a list (so you can handle clashes correctly). Also 53 is rather a small size (though it is prime which is nice).

But there is already a dictionary like object in the standard library. std::map or std::unordered_map. The first one builds a structure with the same access characteristics as a balanced binary tree. The second one has access characteristics like a hash table. You can choose either and it will work fine.

In your node object you store pointers to the other nodes.

std::vector<Vertex*> neighbors

Personally I would store the iterator (but that's just a small personal preference).

Visitor Pattern

You are recording visitation to the nodes in the nodes themselves. This is a simple solution but requires you to reset the whole graph before each traversal. PS you don't reset the visited attribute so you can only call DFT() once then it becomes invalid.

There is a better technique called a visitor pattern. This basically delegates traversal of the graph to the nodes. But Uses the visitor object as a callback to do processing. Thus you can control order from within your code.

Note: I would not think it as normal to describe a traversal as "Depth First" on cycle graphs. The graph may not be fully connected and there is no start point indicating a top.

Visitor pattern:

struct Visitor
{
 virtual ~Visitor() {}
 virtual bool visit(Vertex& node) = 0; // returns false if we have already
 // seen the node, true otherwise
};
struct Vertex {
 std::string Name;
 std::vector<Vertex*> neighbors; //vector of pointers to Vertices
 void accept(Visitor& visitor)
 {
 if (visitor.visit(*this)) // If this is the first time
 { // The do all the linked nodes
 for(auto loop: neighbors)
 { loop->accept(visitor);
 }
 }
 }
};

Now we can write an application specific visitor.

struct DFTVisitor: public Visitor
{
 std::set<Vertex*> alreadyVisited;
 int traversalCount;
 DFTVisitor(): traversalCount(0) {}
 virtual bool visit(Vertex& node)
 {
 // Check if we have been to this node.
 if (alreadyVisited.find(&node) != alreadyVisited.end())
 { return false;
 }
 alreadyVisited.insert(&node);
 std::cout << traversalCount << ": " << node.Name << std::endl;
 traversalCount++;
 return true;
 }
};

Now the graph DFT becomes:

void Graph::DFT(std::string start) {
 Vertix* s = getFirstVertex(start);
 if (s != NULL)
 {
 DFTVisitor visitor;
 s->accept(visitor);
 }
}

Question 3

I did write a quick reset method to clear the bool visited, but I very much like the visitor pattern shown here. As for the hashing function, I admit it is very poor. I wrote a much better (probably still not great) hashing function and dictionary for a previous assignment, but have yet to implement it into this one. Unfortunately, use of the STL is limited for these assignments as we're building our own implementations of many of the containers/functions.

Question 4

No point writing a hash table until you have a list. Use a vector and search for nodes in the vector when looking them up. May be slower (but not significantly given the input data size) but at least it will work correctly. You can even sort the vector after each insertion (this can make finding elements quicker O(ln(N)).

Question 5

As for hashing: Some Simple Hashes‌; a well known one for strings is Bernstein hash its not known why it works so well but it seems to give reasonable distributions for small keys (not perfect but very simple to implement).

Question 6

I've written a doubly linked list and hash table (combining ideas from Bernstein and Knuth) that I'll be implementing here - my main concern was the logic of the graph itself.

Question 7

Few things need explanation and correction.

what is 53 in Graph::Graph()?
getHashVal return value depends on the graph size, which may change as new vertices are added, so the same string may return different hash depending on the graph state. Is it intentional?
Is there a way to clear visited attribute? If not, it is not possible to make another traversal.
Why Graph::addNeighbor() treats vertex and neighbor asymmetrically?
traversalCount is only used in DFT, so it is better be a local variable than of a class member.

Update: tloveless commented:

53 is a prime number, I've read primes are better for hashing functions

I'd agree for a correct hash function. Sum of character values modulo something is equally bad regardless the primality of the modulo. Besides, for more than 53 entries its results are plain wrong.

getHashVal is intentionally using the graph size - I've thought about using linked lists to deal with collisions when returning the same hash value, but I'm not sure if that works with graph logic

The important moment is that the same string will generate different hashes depending on the graph size (see above). And in any case collisions must be resolved anyway - there's no perfect hash for arbitrary strings.

addNeighbor() treats vertex and neighbor asymmetrically as this is supposed to be an undirected graph,

Sorry I am not sure you understand. As coded addNeighbor considers

(graph[VertexHashVal].Name.compare(vertex) != 0)

as a collision, while

(graph[NeighborHashVal].Name.compare(neighbor) !=0)

as not. Why?

I'd guess you coded it this way to cope with a recursive call to addNeighbor. There is no reason for it to be recursive at all.

traversalCount would be reset as a local variable due to the recursive call to DFT. How would I make it local without resetting it with each recursive call?

Pass it down as a parameter, I suppose.

Question 8

53 is a prime number, I've read primes are better for hashing functions - getHashVal is intentionally using the graph size - I've thought about using linked lists to deal with collisions when returning the same hash value, but I'm not sure if that works with graph logic, addNeighbor() treats vertex and neighbor asymmetrically as this is supposed to be an undirected graph, and traversalCount would be reset as a local variable due to the recursive call to DFT. How would I make it local without resetting it with each recursive call?

Question 9

There's not enough room to follow up here; I'll take a liberty to update the answer with your comment inline.

Question 10

This looks pretty clean. I have several things to address:

No need to assign a member in the constructor body:
```
Vertex::Vertex() {visited = false;}
```
For this, prefer an initializer list:
```
Vertex::Vertex() : visited(false) {}
```
Each member within the list is initialized in the form member(value) and is separated by commas. Another advantage of initializer lists is that const members can be initialized.
Each of your functions that take an std::string do not modify this string. When you pass in an object with no intention of modifying it, you should pass by const&. This avoids an unnecessary copy and prevents the parameter from being modified accidentally.
```
void func(std::string const& str) {}
```

This:

if (graph[HashVal].visited == false)

can be simplified to this:

if (!graph[HashVal].visited)

Prefer to use "\n" over std::endl for newlines. The latter flushes the buffer, which is slower and is usually unnecessary. More info about this can be found here.
```
std::cout << "This outputs three newlines\n\n\n";
```
You don't need a return 0 at the end of main(). Reaching this point already implies a successful termination, and the compiler will do the return for you.

Question 11

Would you recommend the newline \n used as: std::cout << Count << ": " Name << "\n";?

Question 12

@tloveless: Yes, that also works.

Question 13

Jamal - are there any performance gains with the boolean simplification, or just preference on style/readability?

Question 14

@tloveless: It's just preference. It could also be safer in case you accidentally use = instead of ==.

Loki Astari Loki Astari 97.7k5 gold badges126 silver badges341 bronze badges · Accepted Answer · 2014-05-13 06:22:18Z

Couple of major issues.

You are writing your own hash function.

In addition some big problems.

You are implementing your own hash based dictionary (badly).
You need to use the visitor pattern (rather than have a mark in each node).

The hash function:

int Graph::getHashVal(std::string name) {
 int HashVal = 0;
 for (int i = 0; i < name.size(); i++)
 {
 HashVal += name[i];
 }
 HashVal %= graph.size();
 return HashVal;
}

That is a relatively trivial hash function that is known to have a bad distribution. If your keys are the same length you are very likely to get clashes.

 TAP
 PAT

For any combination of 3 letter words the distance between any hashes can not be greater than 78. Without doing any analysis I bet the distribution of 3 letter words is actually normal so a whole bunch of words have a hash within 12 places of 234.

Writing hash functions is hard to do well so don't. Pick a well known one and use that.

Building your own dictionary

You use your hash function to build a dictionary like object using a vector. A couple of problems with this. But the main one is that clashes are not correctly resolved. Each entry in hash should be a list (so you can handle clashes correctly). Also 53 is rather a small size (though it is prime which is nice).

But there is already a dictionary like object in the standard library. std::map or std::unordered_map. The first one builds a structure with the same access characteristics as a balanced binary tree. The second one has access characteristics like a hash table. You can choose either and it will work fine.

In your node object you store pointers to the other nodes.

std::vector<Vertex*> neighbors

Personally I would store the iterator (but that's just a small personal preference).

Visitor Pattern

You are recording visitation to the nodes in the nodes themselves. This is a simple solution but requires you to reset the whole graph before each traversal. PS you don't reset the visited attribute so you can only call DFT() once then it becomes invalid.

There is a better technique called a visitor pattern. This basically delegates traversal of the graph to the nodes. But Uses the visitor object as a callback to do processing. Thus you can control order from within your code.

Note: I would not think it as normal to describe a traversal as "Depth First" on cycle graphs. The graph may not be fully connected and there is no start point indicating a top.

Visitor pattern:

struct Visitor
{
 virtual ~Visitor() {}
 virtual bool visit(Vertex& node) = 0; // returns false if we have already
 // seen the node, true otherwise
};
struct Vertex {
 std::string Name;
 std::vector<Vertex*> neighbors; //vector of pointers to Vertices
 void accept(Visitor& visitor)
 {
 if (visitor.visit(*this)) // If this is the first time
 { // The do all the linked nodes
 for(auto loop: neighbors)
 { loop->accept(visitor);
 }
 }
 }
};

Now we can write an application specific visitor.

struct DFTVisitor: public Visitor
{
 std::set<Vertex*> alreadyVisited;
 int traversalCount;
 DFTVisitor(): traversalCount(0) {}
 virtual bool visit(Vertex& node)
 {
 // Check if we have been to this node.
 if (alreadyVisited.find(&node) != alreadyVisited.end())
 { return false;
 }
 alreadyVisited.insert(&node);
 std::cout << traversalCount << ": " << node.Name << std::endl;
 traversalCount++;
 return true;
 }
};

Now the graph DFT becomes:

void Graph::DFT(std::string start) {
 Vertix* s = getFirstVertex(start);
 if (s != NULL)
 {
 DFTVisitor visitor;
 s->accept(visitor);
 }
}

I did write a quick reset method to clear the bool visited, but I very much like the visitor pattern shown here. As for the hashing function, I admit it is very poor. I wrote a much better (probably still not great) hashing function and dictionary for a previous assignment, but have yet to implement it into this one. Unfortunately, use of the STL is limited for these assignments as we're building our own implementations of many of the containers/functions.
No point writing a hash table until you have a list. Use a vector and search for nodes in the vector when looking them up. May be slower (but not significantly given the input data size) but at least it will work correctly. You can even sort the vector after each insertion (this can make finding elements quicker O(ln(N)).
As for hashing: Some Simple Hashes‌; a well known one for strings is Bernstein hash its not known why it works so well but it seems to give reasonable distributions for small keys (not perfect but very simple to implement).
I've written a doubly linked list and hash table (combining ideas from Bernstein and Knuth) that I'll be implementing here - my main concern was the logic of the graph itself.

Stack Exchange Network

Depth First Traversal of a Graph

3 Answers 3

The hash function:

Building your own dictionary

Visitor Pattern

Visitor pattern:

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Depth First Traversal of a Graph

3 Answers 3

The hash function:

Building your own dictionary

Visitor Pattern

Visitor pattern:

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions