Let's consider a simple graph represented by the following edges, where true indicates a visible node and false indicates a non-visible node:
A (true) -> B (false)
B (false) -> C (true)
C (true) -> D (true)
The expected output of the CompressPaths method for this graph would be a list of EdgeColumn objects representing the following compressed paths:
A -> C
C -> D
Assumptions:
- The path always begins and ends with visible nodes: This is not valid: A (false) -> B(true) or A (true) -> B (false) -> C (false).
In simple words: for each start visible node, I'm finding the nearest visible node and add it them both (start and next) to the final result.
public class EdgeCompression
{
private Dictionary<string, List<Edge>> _edgesMap = new Dictionary<string, List<Edge>>();
public List<Edge> CompressPaths(List<Edge> edges)
{
if (edges == null) throw new ArgumentNullException(nameof(edges));
var result = new List<Edge>();
foreach (var edge in edges)
{
if (!_edgesMap.ContainsKey(edge.StartNode))
{
_edgesMap[edge.StartNode] = new List<Edge>();
}
_edgesMap[edge.StartNode].Add(edge);
}
foreach (var edge in edges)
{
if (edge.StartNodeVisible)
{
FindAndNextVisibleNode(edge.StartNode, edge, result, new HashSet<string>());
}
}
return result;
}
private void FindAndNextVisibleNode(string sourceObjectGUID, Edge edge, List<Edge> result, HashSet<string> visitedNodes)
{
if (!visitedNodes.Add(edge.NextNode)) return;
if (edge.NextNodeVisible)
{
result.Add(new Edge
{
StartNode = sourceObjectGUID,
NextNode = edge.NextNode,
StartNodeVisible = true,
NextNodeVisible = true
});
}
else
{
if (_edgesMap.ContainsKey(edge.NextNode))
{
foreach (var nextEdge in _edgesMap[edge.NextNode])
{
FindAndNextVisibleNode(sourceObjectGUID, nextEdge, result);
}
}
}
}
}
public class Edge
{
public string StartNode { get; set; }
public string NextNode { get; set; }
public bool StartNodeVisible { get; set; }
public bool NextNodeVisible { get; set; }
}
I'm looking for feedback on:
- Any potential performance issues with large datasets.
- Ways to improve the recursion logic, especially concerning handling cycles and preventing infinite recursion (the visited logic).
I've tried to acheive the same result within sql with no success
-
\$\begingroup\$ As your aim is the removal of nodes, it might be worthwhile to transform your edge list into a node list first (with links). Then you can go through your node list once and remove any invisible ones. When you remove a node, you can add all of the successors to each of the predecessors and then consider it removed. \$\endgroup\$DasKrümelmonster– DasKrümelmonster2024年02月18日 12:04:50 +00:00Commented Feb 18, 2024 at 12:04
-
\$\begingroup\$ @DasKrümelmonster thank you, this is what I did eventually \$\endgroup\$Shahar Shokrani– Shahar Shokrani2024年02月18日 20:32:01 +00:00Commented Feb 18, 2024 at 20:32
-
\$\begingroup\$ Is there any particular reason why do implement your own graph representation and traversal? Have you checked QuickGraph or its fork QuikGraph? \$\endgroup\$Peter Csala– Peter Csala2024年02月19日 16:09:59 +00:00Commented Feb 19, 2024 at 16:09
-
1\$\begingroup\$ @PeterCsala Nice lead, I'll check it \$\endgroup\$Shahar Shokrani– Shahar Shokrani2024年02月19日 18:00:00 +00:00Commented Feb 19, 2024 at 18:00
1 Answer 1
Ok if I understand correctly you are trying to traverse a graph and compress any edges that contain an invisible node.
I believe an improvement to your design is to have both a Node and an Edge class.
Edges will contain two nodes and a compress method and Nodes will contain a List of edges and a Visible property:
public class Edge
{
public Node StartNode { get; private set; }
public Node? NextNode { get; init; }
public void Compress()
{
if (NextNode is null) return;
StartNode.Edges.AddRange(NextNode.Edges);
StartNode.Edges.Remove(this);
}
}
public class Node
{
public bool Visible { get; set; }
public List<Edge> Edges { get; } = new();
}
Also it is worth pointing out that this representation of a graph precludes the need to maintain an edge map, since each node owns an adjacency list.
In terms of performing the traversal the approach I am going with is BFS (Breadth-first-search). This has a time complexity of O(|V| + |E|) where E is the set of edges and V is the set of vertices.
This would look something like this:
public static Edge TraverseAndCompress(Edge edge)
{
Queue<Edge> bfs = new();
HashSet<Edge> visited = new();
bfs.Enqueue(edge);
while(bfs.Count > 0)
{
var current = bfs.Dequeue();
visited.Add(current);
if (current.NextNode is null) continue;
if (!current.NextNode.Visible)
{
current.Compress();
}
foreach(var startNodeEdge in current.StartNode.Edges)
{
if(!visited.Contains(startNodeEdge))
bfs.Enqueue(startNodeEdge);
}
}
return edge;
}
If you require the answer to be in the form of a List<Edge>
then the code here can be modified to accommodate that. It was made in this way to reduce the space complexity of the algorithm.
However all that said, I think the best approach to solving this would have been a nuget package. There are plenty of great graph libraries out there capable of graph traversal and edge compression. Also by using them you will also benefit from whatever performance enhancements the library creator presumably made
Hope that helps :)