Background
I can't seem to find an existing answer solving this doubt of mine.
In academic books, it seems that DFS is usually presented in both recursive and iterative forms.
The implementations shown usually display the following complexities, where V
is the number of vertices, and E
the number of edges in the graph. The graph is represented by adjacency lists.
Recursive:
Time: O(V + E)
Space: O(V)
Iterative:
Time: O(V + E)
Space: O(E)
The reason for the discrepancy in the space complexity is because in the iterative version, we're adding all the edges of the current visiting vertex to be "queued" in a stack.
Because of the recursive implementation leading to potential stack-overflows in big graphs, a space-optimised iterative version is desired, which has been mentioned - but not shown - in some academic materials. The key here is to use iterators achieving a final complexity of:
Iterative (Iterators):
Time: O(V + E)
Space: O(V)
Question
Is my implementation correct? It works for my the test below, but I'm not sure if the test is general enough and it is only working for this special case.
Graph Representation
struct Node {
int data;
list<Node*> neighbours;
Node(int val) : data(val) {}
};
Algorithm (Pre-Order DFS)
void preDFSIter(vector<Node*>& adjList) {
stack<pair<list<Node*>::iterator, list<Node*>::iterator>> nodes;
unordered_set<Node*> visited;
for (Node* curStartNode : adjList) {
if (visited.count(curStartNode)) continue;
list<Node*> startList = {curStartNode};
nodes.push({startList.begin(), startList.end()});
while (!nodes.empty()) {
auto&[curNodeIt, curNodeItEnd] = nodes.top();
if (curNodeIt != curNodeItEnd) {
Node* curNode = *curNodeIt;
++curNodeIt;
if (!visited.count(curNode)) {
visited.insert(curNode);
cout << to_string(curNode->data) << " ";
nodes.push({curNode->neighbours.begin(), curNode->neighbours.end()});
}
} else {
nodes.pop();
}
}
}
cout << endl;
}
Algorithm (Post-Order DFS)
void postDFSIterVar(vector<Node*>& adjList) {
unordered_set<Node*> visited;
stack<tuple<Node*, list<Node*>::iterator, list<Node*>::iterator>> nodes;
for (Node* curStartNode : adjList) {
if (visited.count(curStartNode)) continue;
list<Node*> startList = {curStartNode};
nodes.push({nullptr, startList.begin(), startList.end()});
while (!nodes.empty()) {
auto&[curParent, curNodeIt, curNodeItEnd] = nodes.top();
if (curNodeIt != curNodeItEnd) {
Node* curNode = *curNodeIt;
++curNodeIt;
if (!visited.count(curNode)) {
visited.insert(curNode);
nodes.push({curNode, curNode->neighbours.begin(), curNode->neighbours.end()});
}
} else {
if (curParent) cout << to_string(curParent->data) << " ";
nodes.pop();
}
}
}
cout << endl;
}
Main
int main() {
Node* node0 = new Node(0);
Node* node1 = new Node(1);
Node* node2 = new Node(2);
Node* node3 = new Node(3);
Node* node4 = new Node(4);
Node* node5 = new Node(5);
Node* node6 = new Node(6);
Node* node7 = new Node(7);
Node* node8 = new Node(8);
node0->neighbours = {node1, node2};
node1->neighbours = {node3, node5};
node2->neighbours = {node5};
node3->neighbours = {node3, node6};
node4->neighbours = {node1, node3, node8};
node5->neighbours = {node0, node4, node7};
node6->neighbours = {node4};
node7->neighbours = {node4, node7};
node8->neighbours = {node6, node7};
vector <Node*> adjList = {node0, node1, node2, node3, node4, node5, node6, node7, node8};
preDFSIter(adjList);
postDFSIter(adjList);
return 0;
}
1 Answer 1
Iterative != space-optimized
Because of the recursive implementation leading to potential stack-overflows in big graphs, a space-optimised iterative version is desired
Some functions, like the factorial or a Fibonacci sequence generator, can be implemented quite compactly as a recursive function, but indeed suffer from a lot of stack usage. Iterative versions of these functions can be written that only need \$O(1)\$ space. However, as you've shown yourself, for the DFS search algorithm, both the recursive and iterative versions take \$O(V)\$ space. An iterative version just moves the problem from the program stack to a std::stack
that lives on the heap. Heap space is also not unlimited, although on most systems you do have more heap than stack space.
I don't see anything space-optimized in your code. In fact, there are lots of std::list
s that could have been std::vector
s, and std::unordered_set<Node*> visited
also seems quite wasteful. We can fix that, but first:
Create a struct Graph
Instead of passing around pointers to nodes or to adjacency lists, create a struct Graph
that represents an entire graph. This can then hide how the actual graph is stored, and have member functions to add, remove and visit nodes. In the rest of the examples I'll keep it simple though.
Make your graph more space-efficient
Instead of having lots of individually allocated nodes that point to each other, with all the memory management issues that entails (like the memory leak that JimmyHu pointed out), consider storing all the nodes in a single std::vector
. The position in the vector is then the node's ID. That means you can now also refer to nodes by their index instead of needing pointers:
struct Node {
int data;
std::vector<std::size_t> neighbours;
...
};
struct Graph {
std::vector<Node> nodes;
};
void preDFSIter(const Graph& graph) {
...
}
Now your main()
can look like:
int main() {
Graph graph;
graph.nodes.resize(9);
graph.nodes[0].neighbours = {1, 2};
graph.nodes[1].neighbours = {3, 5};
...
preDFSIter(graph);
...
}
Note that if you know the maximum amount of nodes you can have in a graph, you can use something smaller than std::size_t
as the index type.
Space-optimized DFS
Now that we have the graph in a more compact form, as well as having indices instead of pointers, we can optimize the DFS algorithm as well.
First, the "stack" can be simplified a lot. For every entry in the stack we need to know which node we are referring to, and what position we are in its list of neighbours:
struct StackEntry {
std::size_t node;
std::size_t neighbour;
};
std::stack<StackEntry> nodes;
Second, we can turn visited
in a vector of bools, one for each node in the graph:
std::vector<bool> visited(graph.nodes.size());
Then it's just a matter of converting your algorithm to the new datastructures:
for (std::size_t node = 0; node != graph.nodes.size(); ++node) {
if (visited[node])
continue;
nodes.emplace(node, 0);
while (!nodes.empty()) {
auto& current = nodes.top();
if (current.neighbour != nodes[current.node].neighbours.size()) {
auto neighbour = nodes[current.node].neighbours[current.neighbour];
++current.neighbour;
if (!visited[neighbour]) {
visited[neighbour] = true;
std::cout << nodes[neighbour].data << ' ';
nodes.emplace(neighbour, 0);
}
} else {
nodes.pop();
}
}
std::cout << '\n';
}
Note that this still uses \$O(V)\$ space, but we greatly reduced the constant factor. Also, we got rid of pointers and manual memory management.
Make it more generic
You hardcoded the type of data
to int
, and the only thing the DFS algorithms do is to print out data
in some order to std::cout
. But in a real application your data will probably have some other type, and you probably want to do something besides printing it. You can make your code much more usable by making it more generic: make Node
a template so you can have any type for data
, and allow a visitor function to be passed as an argument to your DFS functions that will be applied to each node. This way, the caller can decide what to do with each node that is visited:
template<typename T>
struct Node {
T data;
...
};
template<typename T>
struct Graph {
std::vector<Node<T>> nodes;
...
};
template<typename T, typename Visitor>
void preDFSIter(const Graph<T>& graph, Visitor function) {
...
if (!visited[neighbour]) {
visited[neighbour] = true;
function(nodes[neighbour]);
...
}
...
}
Avoid code duplication
preDFSIter()
and postDFSIter()
are almost identical, the only thing different is when they print out data. Consider creating a single function that can do either, based on a parameter that tells it which order to use. Or have it take two visitor functions, one applied pre-order, the other post-order, and then the caller can decide to pass a real function and a dummy function that does nothing, or perhaps even two useful functions.
-
\$\begingroup\$
std::vector<std::size_t> neighbours;
inside each Node is still far from ideal. A std::vector is normally a struct of 3 pointers, pointing to another heap-allocated chunk of memory. I guess that's unavoidable if we want an array / vector of Nodes, so they can't be variable-size (struct Node { int data; unsigned length; unsigned neighbours[]; }
flexible array member.) A 32-bit or 16-bit length and a single pointer would suffice for a graph that doesn't have to keep mutating. So maybe astd::unique_ptr<unsigned> neighbours
for an array of unsigned... \$\endgroup\$Peter Cordes– Peter Cordes2025年04月06日 22:53:03 +00:00Commented Apr 6 at 22:53 -
1\$\begingroup\$ Or to reduce allocator bookkeeping overhead, carve the neighbour lists out of one big allocation of a block of
unsigned
. So each Node just has a pointer (or even 32-bit index for non-huge graphs) into a big array, and a length. The allocator behindnew
/delete
only has to track thestd::vector<Node>
allocation and thestd::vector<unsigned> neighbour_buffer
. (malloc
/realloc
are more efficient for growing a large array than new/delete, since realloc can avoid copy/delete. And/or you can overallocate and shrink to fit with realloc. But if you want idiomatic C++, gotta usenew
) \$\endgroup\$Peter Cordes– Peter Cordes2025年04月06日 23:00:50 +00:00Commented Apr 6 at 23:00 -
\$\begingroup\$ There's also the possibility to store an undirected graph of \$N\$ nodes in an \$N \times N\$ array of bits to represent which edges are present. That's the most efficient for dense graphs. And of course you still need a vector for the data associated with each vertex. \$\endgroup\$G. Sliepen– G. Sliepen2025年04月07日 05:15:28 +00:00Commented Apr 7 at 5:15
Explore related questions
See similar questions with these tags.
main
function, younew
8Node*
, but you don't calldelete
to free the resource. This may lead to memory leak in large program. \$\endgroup\$