I have a working way of deleting all vertices of degree less than 2 in a graph until no more can be found.
But my solution has a significant issue which is that it calls Graph.delete_vertices()
for each iteration. This call is very slow on igraph Graphs because igraph is quick on queries but slow on updates because of indexing.
This means that doing this on a big graph (millions of nodes/edges) which I need takes a will, especially if there are a lot of chained degree 2 nodes ending in a degree 1 node.
import igraph as ig
graph = ig.Graph.Formula('D-A:B:F:G, A-C-F-A, B-E-G-B, A-B, F-G, H-F:G, H-I-J, Z-A, Y-B')
layout = graph.layout("kamada_kawai")
ig.plot(graph, layout=layout, bbox=(0, 0, 400, 400))
# Code needing of help
while vertices := [v for v in graph.vs.select(_degree_le=1)]:
graph.delete_vertices(vertices)
ig.plot(graph, layout=layout, bbox=(0, 0, 400, 400))
Start graph: (updated)
End graph: (updated)
In this example here, four graph updates are needed. The first pass deletes six nodes, the second deletes the three nodes that becomes degree 1 after the first update; then two nodes and a single node on the last update.
Edit: Alright I have a better way of doing it that doesn't require multiple graph updates. Not sure if it is the best way but it definitely makes more sense and is faster.
vertices = {v for v in graph.vs.select(_degree_le=1)}
needs_to_be_checked = set(vertices)
while needs_to_be_checked:
vertex = needs_to_be_checked.pop()
for n_vertex in vertex.neighbors():
if n_vertex in vertices:
continue
if n_vertex.degree() == 2:
vertices.add(n_vertex)
needs_to_be_checked.add(n_vertex)
graph.delete_vertices(vertices)
EDIT 2 (taking in consideration @DeathIncarnate's comment):
vertices = {v for v in graph.vs.select(_degree_le=1)}
needs_to_be_checked = set(vertices)
while needs_to_be_checked:
vertex = needs_to_be_checked.pop()
for n_vertex in vertex.neighbors():
if n_vertex in vertices \
or sum(1 for v in n_vertex.neighbors() if v not in vertices) > 1:
continue
vertices.add(n_vertex)
needs_to_be_checked.add(n_vertex)
graph.delete_vertices(vertices)
1 Answer 1
Interesting problem. Let's first examine requirements, then the current solution (in EDIT 2), and finally an alternate solution.
requirements
This code needs to spell out its input and its output. The OP doesn't even specify that we're manipulating an undirected graph. I would like to see an explicit comment that multiple connected components are OK, e.g. a forest of trees.
The code should be packaged up with a name, perhaps def prune_leaves(g):
unit test
We should see a test suite exercising this code. That way reviewers could critique the adequacy of the test cases. For example, a cycle ABC with leaf D hanging off of C does not suffice; we'd want to see leaf E hanging off of C, as well.
post-condition
We should explicitly specify loop invariants and, most importantly, the promise we make about the result graph. A unit test should be able to easily call this predicate to verify we're good.
performance
This submission is about performance, but it includes no observed overall timing figures, no timings of which lines consume the most time, and no code to support benchmark measurements.
There should definitely be a comment in the code
explaining that .delete_vertices(thousand_vertices)
is much cheaper than a thousand calls that each
delete a single vertex.
Or let head-to-head benchmark timings tell that story.
There should certainly be a comment stating the big-Oh complexity of the algorithm. (I was a little disappointed that the igraph documentation does not mention the cost of deleting N vertices.)
current code
vertices = {v for v in graph.vs.select(_degree_le=1)}
informative names
vertices
is accurate, but not as helpful a name as it could be.
Consider calling it tombstones
or pending_deletions
.
The needs_to_be_checked
name is much better,
but try to make it a noun. Consider unchecked_nodes
,
or pending_nodes
, or use an adjective that relates
to the loop invariant.
pre-processing
It is unclear that .select(_degree_le=1)
is a big help. Maybe it is.
Be more specific about what "typical" input graphs look like,
and show us timing measurements,
so reviewers can evaluate how helpful it is.
Consider a cycle ABC followed by a thousand nodes D, E, F ...
having degree 2, and final node has degree 1.
Result will be just a cycle of three vertices.
The .select()
found just a single vertex;
a thousand more were found in the while
loop.
extract helper
The nested loops are not a "long" piece of code, but it's still worth considering breaking out the inner loop as a helper function. The chief benefit would be it gives you an opportunity to name the action it's taking. It would also facilitate small unit tests.
alternate algorithm
Consider a line, node A connected to a thousand degree-2 nodes connected to Z, which will become the empty graph. You already have progressed beyond the naïve approach of making 1002 deletion calls, preferring to pass in a container of 1002 nodes that are marked for deletion. Let's consider a slightly different approach.
Rather than tagging nodes with a "going to be deleted!" boolean, we could tag them with a "degree after pruning" integer. And use that tag to choose victim vertices.
cycle detection
It seems that .is_loop() already computes the inverse of what you want, since you essentially retain just nodes that are part of cycles. Consider passing in all edges and using the resulting booleans to filter which nodes should survive.
Again, the absence of unit tests and benchmark support isn't helping -- you will need that to evaluate whether a competing approach is winning.
line deletion
We desire O(|V|) running time. I think that "line deletion" is the only task that can take an interesting amount of time. Consider a cycle ABC with a hundred lines hanging off it, each a thousand nodes long.
In this case retaining the initial "find all degree-1 nodes"
is helpful. But between needs_to_be_checked
and the
nested loops, I'm afraid I'm not quite seeing the behavior
we're looking for.
Given a degree-1 victim node, it seems like we should greedily chase its neighbors in the hopes of eating a line, before we move on to examine some unrelated node.
This codebase achieves some of its design goals. It could be much easier for a newly hired engineer to call it and to make predictions about its behavior.
I would not be willing to delegate or accept maintenance tasks for this codebase in its current form.
-
\$\begingroup\$ Thanks for this great analysis. You are right that I should have at least included timed benchmarks for a "performance" related-question. I will update my post when I have time to be more explicit and complete. \$\endgroup\$Kalak– Kalak2023年04月27日 06:44:02 +00:00Commented Apr 27, 2023 at 6:44
degree
won't have decreased yet, so your check needs to count [pseudocode]degree - count(neigbours in vertices)
or forcibly decrease thedegree
of neighbours of removed vertices as it goes. \$\endgroup\$