Find Eulerian Tour Algorithm - with multithreading and random permutations

Question 1

After struggling for a long while on this through the algorithms course on Udacity, I finally understood enough of the concept to develop this implementation. However, it lacks efficiency and finesse. Would love some feedback on how to improve this and what it might be getting wrong about the concept of graphs, tours/paths, and depth-first search. The code is heavily commented, but let me know if you have any questions.

I lumped the graphs into three types and figured out how to solve each one and combined it in this implementation while adding the ability to make random permutations.

Intuitively, I think the "Simple" (all even nodes of degree 2) graphs would require O(E) to solve because the graph losses an edge every iteration. Then the depth-first search implementation...is not ideal. My intuition is that in the worst case (each of the initial node/edges traversed are exactly the wrong ones) this would require O([E*(E-1)]/2) because there must be a bound on on how deep it has to go before finding out that it as to backtrack. Given the initial checks, there is a tour/path there, so it will eventually find it. The last type of graph uses DFS plus some constant. Also, the algorithm tries to fall back on the "simple" graph solution when it can, so that makes the average case a little better. As for best case, I'd say O(1) when it can't be done...haha, or maybe O(E) for simple graphs or graphs that decompose to a simple graph. I worked on having the algorithm work forwards and backwards for the 2 odd degrees graphs, (which has passed a number of tests) because I didn't think those graphs could be solved as a simple graph because it would never get there. Not sure how to account for that in my conception of complexity...I feel like it would probably cause more issues in the limit than not...

Also, there is at least one graph that I found it doesn't work for: [(1, 2), (2, 3), (3, 2)], which is solvable - and it does tend to find the right answer. Any others?

Appreciate the feedback and corrections.

#!/bin/python/
import sys
import random
from queue import Queue
from threading import Thread
class Graph_Traveler(Thread):
 def __init__(self, queue, graph, path, start, tag):
 Thread.__init__(self)
 self.tag = tag
 self.queue = queue
 self.graph = graph[:]
 self.start_node = start
 if len(path) == 0:
 self.path = [start]
 else:
 self.path = path[:]
 def run(self):
 graph_cache, path_cache = self.graph[:], self.path[:]
 while True:
 if self.queue.empty() and len(graph_cache) == 0:
 self.queue.task_done()
 for edge in self.queue.get():
 if type(edge) == int:
 continue
 if self.start_node not in edge:
 self.queue.put(edge)
 else:
 ddegrees = get_graph_info(graph_cache, nodes = False)
 curr_idx = edge.index(path_cache[-1])
 node = edge[curr_idx ^ 1]
 path_cache.append(node)
 graph_cache, path_cache = dfs_traversal(graph_cache, 
 node, path_cache, ddegrees)
 if check_path_issues(path_cache):
 # Reset progress and move on to next node
 path_cache = self.path[:]
 graph_cache = self.graph[:]
 continue
 else:
 # Search has succesfully from the starting point 'node'
 #self.queue.put(1)
 self.queue.put(self.path)
 #print(self.path)
 if self.queue.empty():
 self.queue.task_done()
 else:
 continue
def find_eulerian_tour(graph, path = None):
 """Given a list of tuples representing a graph, 
 return the eulerian tour on that graph as a list of the nodes.
 Failing that, return the eulerian path.
 This implementation includes random permutations depending on the graph
 or it returns 'No possible eulerian tour' if there is no possible path or
 this algorithm cannot yet handle that.
 Primarily works by segmenting the incoming graph into different types and
 then calling different recursive algorithms depending on the type.
 OEO graphs (or Odd-Even-Odd graphs) are graphs that initially have two odd
 degree nodes and any number of even degree nodes. In order to make this a 
 tour, these will begin with one of the odd nodes and then use 
 the other general algorithms
 Simple graphs are graphs that have all degree two nodes. These are easily 
 solved as a tour by following the edges. The algorithms check if the
 graphs have reduced to this one and try more easily/quickly solve them.
 Looped graphs are any graphs that at least one even degree node that is 
 larger than two. These graphs have a loop that is not as easy to solve
 as the simple graphs. A breath-first search algorithm is used for to solve
 these, but it also checks if the graph has reduced to a simple graph
 and applies that.
 Althought this is a path and not a tour algorithm, it had to be named this
 way for a project that was submitted. For intitially even graphs,
 it will always produce a tour.
 """
 # Base case
 if path and not graph:
 return path
 search_graph = graph[:]
 ddegrees, even_nodes, odd_nodes = get_graph_info(search_graph)
 if path == None:
 if not is_eulerian(ddegrees):
 return "No possible eulerian tour" #Or not yet for some...
 # Case: OEO Graph{guaranteed to end on Odd node}
 if odd_nodes:
 start_node = odd_nodes[random.randint(0, 1)] #there's ever only 2 initially
 edge, curr_node = find_edge_alg(search_graph, start_node, ddegrees)
 search_graph, path = update(graph, path, edge,
 curr_node ^ 1, curr_node)
 end_node = odd_nodes[odd_nodes.index(start_node) ^ 1]
 queue = Queue(maxsize=0)
 goals = [start_node, end_node]
 paths = [path[:], []]
 tag = ['a', 'b']
 for ii in range(2):
 walker = Graph_Traveler(queue, search_graph[:], paths[ii], goals[ii], tag[ii])
 walker.daemon = True
 walker.start()
 for edge in graph:
 queue.put(edge)
 else:
 path = []
 else:
 # Case: Simple Graph
 if is_simple_graph(ddegrees):
 search_graph, path = solve_simple_graph(search_graph, path)
 else:
 # Case: Looped Graph
 if is_looped(ddegrees):
 search_graph, path = solve_looped_graph(search_graph, path)
 else:
 current = path[-1]
 edge, curr_node = \
 find_edge_alg(search_graph, current, ddegrees)
 assert edge != None
 if len(path) < 1:
 both_idx = curr_node
 else:
 both_idx = None
 search_graph, path = \
 update(search_graph, path, edge, 
 curr_node ^ 1, both_idx)
 path = find_eulerian_tour(search_graph, path)
 return path
#
## Algorithm for solving looped graphs with a Depth-First Search
#
def is_looped(ddegrees):
 """Check if graph is looped.
 Any even node with degree greather than 2"""
 counts = list(set(get_counts_from_dict(ddegrees)))
 loop_num = [count for count in counts 
 if count % 2 == 0 and count not in [0, 2]]
 return len(loop_num) > 0
def solve_looped_graph(looped_graph, path):
 """Depth-First search through the state space of the looped graphs
 See is_looped for a definition
 Recursive implementation of the helper function dfs_traversal, where most
 of the work is done.
 Initialize the path list for initially even-node looped graphs and 
 is main level of interaction between dfs_traversal and other functions.
 Checks if graph has become a simple graph and uses that to solve the rest"""
 start = False
 search_graph = looped_graph[:]
 dfs_path = path[:]
 if len(path) > 0:
 curr_node = path[-1]
 else:
 start = True
 ddegrees = get_graph_info(search_graph, nodes = False)
 # Solve simple graph
 if is_simple_graph(ddegrees):
 search_graph, path = solve_simple_graph(search_graph, path)
 if start:
 #Initializes search for initially even looped graphs
 nodes = list(ddegrees.keys())
 random.shuffle(nodes)
 for node in nodes:
 search_graph, dfs_path = dfs_traversal(search_graph, node, 
 dfs_path, ddegrees, start)
 if check_path_issues(dfs_path):
 # Reset progress and move on to next node
 dfs_path = path[:]
 search_graph = looped_graph[:]
 continue
 else:
 # Search has succesfully from the starting point 'node'
 break
 else:
 #This generally initializes the dfs_traversal for all 
 #OEO graphs that are looped
 search_graph, dfs_path = dfs_traversal(search_graph, curr_node, 
 dfs_path, ddegrees)
 return search_graph, dfs_path 
def dfs_traversal(search_graph, curr_node, path, ddegrees, intialize = False):
 """Performs Breatdh-First Search traversal of the graph
 Most of the power of the looped graph solver lies here.
 Initializes the path list if it hasn't been initialized via [initialize]
 In each iteration, it produces a copy of the path and graph after checking 
 for the base case.
 Attempts to solve simple graph if possible
 Otherwise, randomly traverses through the nodes of the graph via the edges
 until it encounters an empty list of edges from that node or
 complete explores the nodes from that node without any success.
 The first is done by checking for an empty edge_list retrun.
 The second is done by checking if the whole list of nodes from the current 
 node were exhausted without adding anything to the path.
 (Additionally, could be done via a non-empty graph or identical graph - 
 as that is also part of the base case.)"""
 #Base case
 if path and not search_graph:
 return search_graph, path
 if check_path_issues(path):
 path.append(None)
 return [], path 
 #initialize temp vars to re-set progress
 dfs_graph = search_graph[:]
 dfs_path = path[:]
 if is_simple_graph(ddegrees): #Solve simple graph
 dfs_graph, dfs_path = solve_simple_graph(dfs_graph, dfs_path)
 else:
 edge_list = find_edge_alg(dfs_graph, curr_node, ddegrees, 
 return_edges = True)
 if len(edge_list) == 0:
 path.append(None) #No more arcs from here
 return [], path
 random.shuffle(edge_list)
 #indicies for the current node for all its edges
 curr_idx_list = [edge.index(curr_node) for edge in edge_list]
 #list of to_nodes to explore (this is the frontier)
 nodes = [edge[idx ^ 1] for idx, edge in zip(curr_idx_list, edge_list)]
 for idx, node in enumerate(nodes):
 edge = edge_list[idx]
 to_node_idx = curr_idx_list[idx] ^ 1
 #Initialize dfs_path
 if intialize:
 both_idx = curr_idx_list[idx]
 else:
 both_idx = None
 dfs_graph, dfs_path = update(dfs_graph, dfs_path, edge, 
 to_node_idx, both_idx)
 #Main recursive call to this function
 #If a graph or path make it past here, they are either the solution
 #or they mean we should move on
 dfs_graph, dfs_path = dfs_traversal(dfs_graph, node, 
 dfs_path, ddegrees)
 if check_path_issues(dfs_path):
 #Reinitialize graph and path list and re-start search
 dfs_path = path[:]
 dfs_graph = search_graph[:]
 continue
 else:
 #if the recursion ended and the path list has no issues,
 #we break out of the loop to check another condition
 #which will bring us out even further back up the graph
 break
 if dfs_path == path and idx == len(nodes) - 1:
 #if the loop over the frontier/nodes ended without the
 #path list changing, we should move up a level (or go back)
 path.append(None)
 return [], path
 return dfs_graph, dfs_path
def check_path_issues(path):
 """Helper function to test the condition of the path list
 Looks for a None in the path list to indicate that there is nothing
 more to explore from there"""
 return (path == None) or (type(path) == list and None in path)
#
## Algorithm for solving the simple graph
#
def is_simple_graph(ddegrees):
 """Check if the graph is simple from its dictionary of counts:
 All edges are of degree 2
 Input: ddegrees"""
 unique_counts = list(set(get_counts_from_dict(ddegrees)))
 return len(unique_counts) == 1 and 2 in unique_counts
def solve_simple_graph(graph, path):
 """Main algorith used to solve a simple graph
 Initializes the path list if it hasn't been initialized then moves
 from one edge to the next removing each edge from a temporary graph
 until the graph is empty"""
 # Base case
 if path and not graph:
 return graph, path
 simple_graph = graph[:] #Temp graph
 if len(path) > 0:
 curr_node = path[-1]
 else:
 rand_start = random.randint(0, len(graph)) - 1 #just to randomize
 curr_node = graph[rand_start][0]
 edge_idx = get_idx_of_edge(simple_graph, curr_node)
 if edge_idx == None: #unsure if needed
 return None
 edge = simple_graph[edge_idx]
 curr_node = edge.index(curr_node)
 if len(path) < 1:
 both_idx = curr_node #initializes path list to remove whole edge
 else:
 both_idx = None
 simple_graph, path = \
 update(simple_graph, path, edge, curr_node ^ 1, both_idx)
 return solve_simple_graph(simple_graph, path)
#
## General Helper functions - used by most of the algorithms
#
def find_edge_alg(graph, curr_node, ddegrees, return_edges = False):
 """Given the current node, it returns the edge to traverse next.
 Selects edge that leads to the largest degree node.
 Input: graph, index for current node, ddegrees (count dictionary),
 returning a list of edges
 Returns: tuple of edges and the idx of current node
 If no available edge found, returns empty list"""
 edge_list = []
 search_graph = graph[:]
 random.shuffle(search_graph)
 for edge in search_graph:
 if curr_node in edge:
 edge_list.append(edge)
 if edge_list:
 if return_edges:
 return edge_list
 return get_next(edge_list, curr_node, ddegrees)
 return []
def get_next(edge_list, node, ddegrees):
 "Given a list of possible edges, selects edge with highest degree node"
 idx_list = [edge.index(node) for edge in edge_list]
 count_list = [ddegrees[edge[idx]] 
 for edge, idx in zip(edge_list, idx_list)]
 highest = max(count_list)
 idx = count_list.index(highest)
 edge = edge_list[idx]
 idx = edge.index(node)
 return edge, idx
#
## BASIC HELPER FUNCTIONS
#
def get_idx_of_edge(graph, node):
 """Given a graph and a node, return the index of the first edge that node
 is found in. If not found, returns False.
 Used for solving the simple graph"""
 missing = True
 for ii, edge in enumerate(graph):
 if node in edge:
 idx = ii
 missing &= False
 break
 if not missing:
 return idx
def update(graph, path, edge, to_pos, from_pos = None):
 """Updates important vars and returns an updated graph and path.
 Input: to_pos is the idx of node moving to
 [from_pos] if passed, path is updated with both from and to nodes
 Used to initializing the path list"""
 if from_pos != None:
 path = [edge[from_pos], edge[to_pos]] #initialize path with both nodes
 else:
 path.append(edge[to_pos])
 graph.pop(graph.index(edge)) #remove edge from graph
 return graph, path
def get_counts_from_dict(ddegrees):
 """Returns the degrees of the nodes in a graph via the dictionary of counts
 Input: ddegrees"""
 return [ddegrees[node] for node in ddegrees.keys()]
def split_nodes(ddegrees):
 """Returns shuffled list of even and odd degree nodes in the most recently 
 counted graph
 Input: ddegrees
 """
 even_nodes = [node for node in ddegrees 
 if ddegrees[node] % 2 == 0 and ddegrees[node] != 0]
 odd_nodes = [node for node in ddegrees 
 if ddegrees[node] % 2 == 1 and ddegrees[node] != 0]
 random.shuffle(even_nodes)
 random.shuffle(odd_nodes)
 return even_nodes, odd_nodes
def get_degrees(graph):
 """Returns a dict with key value pairs {node: degree} 
 by counting the occurance of each node in the graph"""
 nodes_from = [node[0] for node in graph]
 nodes_to = [node[1] for node in graph]
 nodes = nodes_from[:]
 nodes.extend(nodes_to)
 nodes = list(set(nodes))
 ddegrees = {x: nodes_from.count(x) + nodes_to.count(x) for x in nodes}
 return ddegrees
def get_graph_info(graph, nodes = True):
 "Returns variables with information about the current graph"
 ddegrees = get_degrees(graph)
 if nodes:
 even_nodes, odd_nodes = split_nodes(ddegrees)
 return ddegrees, even_nodes, odd_nodes
 return ddegrees
def is_eulerian(ddegrees):
 """"Determines if a graph has Eularian path properties:
 All even nodes or exactly two odd nodes
 Input: ddegrees
 Used here only on the initial graph's count dict
 No node of degree 1 allowed at start"""
 odd_degree = sum([ddegrees[node] % 2 != 0 for node in ddegrees])
 counts = get_counts_from_dict(ddegrees)
 return odd_degree in [0, 2] and 1 not in counts
#
## Unrellated functions and tests
#
def test(graph, n = 100):
 """Test algorithm on a graph for n permutations"""
 matches = []
 print('Running the following graph {} times.\n{}.'.format(n, graph))
 for ii in range(n):
 matches.append(test_tour(find_eulerian_tour(graph[:]), graph[:]))
 print('The program successfully executed {} times.'.format(n))
 print()
 print('{} matches out of {}.\n\n'.format(sum(matches), n))
def make_graph(path):
 """Given a list of numbers that can represent a tour/path,
 will return a graph of the edges"""
 graph = []
 ii = len(path) - 1
 while ii > 0:
 graph.append((path[ii], path[ii - 1]))
 ii -= 1
 return graph
def test_tour(path, graph, ret_graph = False):
 """Test if your tour/path actually matches the graph it came from.
 It recreates a graph as a list of tuple edges from a list of numbers
 that represents a tour/path. Checks if each edge is in the original graph.
 It then removes that edge from the original graph and the reconstructed
 graph. The test is passed if both graphs are empty at the end.
 [ret_graph]: return a copy of the reconstructed graph"""
 original_graph = graph[:]
 temp_graph = make_graph(path)
 if ret_graph:
 reconstruced_graph = temp_graph[:]
 to_del = []
 for arc in temp_graph:
 if arc in graph:
 original_graph.pop(original_graph.index(arc))
 to_del.append(arc)
 else:
 edge = (arc[1], arc[0])
 if edge not in graph:
 return False
 else:
 original_graph.pop(original_graph.index(edge))
 to_del.append(arc)
 for arc in to_del:
 temp_graph.pop(temp_graph.index(arc))
 passed = (len(original_graph) == 0) & (len(temp_graph) == 0)
 if ret_graph:
 return passed, reconstruced_graph
 return passed
if __name__ == '__main__':
 if len(sys.argv) > 1:
 n = int(sys.argv[1])
 print('\nWill run {} tests on each graph.\n'.format(n))
 else:
 n = 100
 #Looped graph
 t1 = [(0, 1), (1, 5), (1, 7), (4, 5), (4, 8), (1, 6), (3, 7), (5, 9),
 (2, 4), (0, 4), (2, 5), (3, 6), (8, 9)]
 #Looped graph
 t2 = [(1, 13), (1, 6), (6, 11), (3, 13), (8, 13), (0, 6), (8, 9), (5, 9),
 (2, 6), (6, 10), (7, 9), (1, 12), (4, 12), (5, 14), (0, 1), (2, 3),
 (4, 11), (6, 9), (7, 14), (10, 13)]
 #Looped graph with OEO form
 t3 = t2[:]
 t3.append((1, 4))
 g = [(1, 2), (2, 3), (3, 2)] #Can't handle self loop with a temrinal yet
 s1 = [(1, 2), (2, 3), (1, 3)] #Simple Graph
 s2 = [(1, 2), (2, 3), (3, 4), (4, 1)] #Simple Graph
 s3 = [(1, 2), (2, 3), (3, 4), (4, 1), (5, 1), (5, 2)] #OEO
 #Big graph to test threading
 t4 = t3[:]
 for ii in range(6):
 t4.extend(t3[:])
 data = [t1, s1, s3, t3, s2, t2, g, t4]
 for graph in data:
 test(graph, n)

Question 2

(Among all the typing errors in the code, I can't decide whether Eularian bothers me more than not capitalising Euler.)

Question 3

There's a lot of code here (far too much to review), so I'm going to make some general observations and then demonstrate how I would tackle the problem.

1. Review

I found the code too complicated to understand. There doesn't seem to be any modularity — that is, there doesn't seem to be a way to break the code into parts that can be understood separately.
Although there are docstrings (good), the specifications of the functions are so complex that I simply couldn't follow the logic. Some of the functions have completely different behaviour depending on the values of the arguments: in particular find_eulerian_tour does one thing if path is None, and something completely different otherwise.
The control flow is not well organized. Normally one expects to see control flowing from complex high-level functions down to simple low-level functions, so that you can understand it bit-by-bit by working your way up from the lowest level. But the code in the post is a twisty maze of recursive calls — find_eulerian_tour calls itself recursively, and also creates two threads using the Graph_Traveler class, which call dfs_traversal, which calls itself recursively.
The use of multiple threads seems unwise to me given that you don't have everything working reliably. Multi-threaded programs are hard to debug, so it's best to start by making a single-threaded program that you are confident is correct, and then work on the multi-threaded case later.
It is not clear to me what Graph_Traveler class is supposed to do, or how it returns its results. (Does it in fact return any results?) It is also not clear why you make exactly two instances of this class, or why the two instances share the same queue.

2. Data structure

Choosing the right data structure is one of the most important steps in any program. A good choice of data structure is one that efficiently supports the operations that the program will need to carry out.

The code in the post represents a graph as a list of edges, each edge being a pair of nodes. The trouble with this representation is that the necessary operations are inconvenient. For example, in find_edge_alg you have curr_node and you want to return the neighbouring node with the highest degree. With the list-of-edges representation, you have to examine every edge in the graph to see if it is incident to curr_node, taking time proportional to the number of edges in the graph. Then to determine the neighbour with the highest degree requires further computation in get_next.

If you look at Wikipedia's article Graph (abstract data type) you'll see that there are three data structures commonly used to represent graphs: adjacency lists, adjacency matrices, and incidence matrices. Which of these best supports the operation of finding the neighbouring node with the highest degree? Read the descriptions of the data structures and think about how you would implement the operation in each. It should be clear that adjacency lists are the winner — they have efficient determination of the neighbours of a node, and efficient determination of the degree of a node (which is the length of the list of neighbours).

3. Using a class

When you have a long-lived data structure (like a graph) together with some operations (such as a finding the neighbours of a vertex), it's often a good idea to implement the data structure using a class, and the operations as methods.

In this case we could do something like this, making use of the built-in collections.defaultdict:

from collections import defaultdict
class Graph:
 """A directed graph whose nodes are any hashable objects."""
 def __init__(self, edges=()):
 """Create a directed graph from an iterable of edges."""
 self._nodes = set() # Set of nodes.
 self._out = defaultdict(set) # Map from node to set of out-neighbours.
 self._in = defaultdict(set) # Map from node to set of in-neighbours.
 for m, n in edges:
 self.add_edge(m, n)
 def __iter__(self):
 """Iterating over the graph yields its nodes."""
 return iter(self._nodes)
 def add_edge(self, m, n):
 """Add an edge from m to n."""
 self._nodes.add(m)
 self._nodes.add(n)
 self._out[m].add(n)
 self._in[n].add(m)
 def remove_edge(self, m, n):
 """Remove the edge from m to n."""
 self._out[m].remove(n)
 self._in[n].remove(m)
 def out_neighbours(self, node):
 """Return the set of out-neighbours of a node."""
 return self._out[node]
 def in_degree(self, node):
 """Return the number of edges ending at node."""
 return len(self._in[node])
 def out_degree(self, node):
 """Return the number of edges starting at node."""
 return len(self._out[node])
 def degree(self, node):
 """Return the number of edges incident to node in either direction."""
 return self.in_degree(node) + self.out_degree(node)

Notes:

This data structure can be read, understood, and tested on its own, without having to know anything about Eulerian paths.
The reason for implementing the graph as mappings from node to sets of its in- and out-neighbours is so that the remove_edge method (which we'll need later on) is efficient — if we used lists instead of sets then removing edges would take time proportional to the number of neighbours of each node in the edge.

Now it's easy to find the neighbouring node with the highest degree:

max(graph.neighbours(node), key=graph.degree)

4. Algorithm

It's always worth doing a bit of research to see whether good algorithms are already known for your problem. In this case, Wikipedia's article on Eulerian paths describes two algorithms. Fleury's algorithm sounds complicated and inefficient, so we want to implement Hierholzer's algorithm. Here's how Wikipedia describes it:

Choose any starting vertex \$v\,ドル and follow a trail of edges from that vertex until returning to \$v\$. It is not possible to get stuck at any vertex other than \$v\,ドル because the even degree of all vertices ensures that, when the trail enters another vertex \$w\$ there must be an unused edge leaving \$w\$. The tour formed in this way is a closed tour, but may not cover all the vertices and edges of the initial graph.

As long as there exists a vertex \$u\$ that belongs to the current tour but that has adjacent edges not part of the tour, start another trail from \$u\,ドル following unused edges until returning to \$u\,ドル and join the tour formed in this way to the previous tour.

This description is for the case of an Eulerian cycle — since we want to find an Eulerian path then we have to modify it slightly to handle the case where there are two odd nodes.

5. Implementation

Here's how I'd implement Hierholzer's algorithm:

from collections import deque
def pick_any(iterable):
 """Return any item from iterable, or raise StopIteration if empty."""
 return next(iter(iterable))
class NoEulerianPath(Exception):
 """Exception raised when there is no Eulerian path."""
def eulerian_path(edges):
 """Return an Eulerian path in the directed graph with the given
 iterable of edges, or raise NoEulerianPath if there is no such path.
 """
 graph = Graph(edges)
 # Mapping from surplus of out-edges over in-edges to list of nodes
 # with that surplus.
 surplus = defaultdict(list)
 for node in graph:
 in_degree = graph.in_degree(node)
 out_degree = graph.out_degree(node)
 degree_surplus = out_degree - in_degree
 if abs(degree_surplus) > 1:
 raise NoEulerianPath("Node {} has in-degree {} and out-degree {}."
 .format(node, in_degree, out_degree))
 surplus[degree_surplus].append(node)
 # Find the starting point for the path.
 if len(surplus[1]) == len(surplus[-1]) == 0:
 # Any starting point will do.
 start = pick_any(graph)
 elif len(surplus[1]) == len(surplus[-1]) == 1:
 # Must start at a node with more out- than in-neighbours.
 start = pick_any(surplus[1])
 else:
 raise NoEulerianPath("Graph has {} odd-degree nodes."
 .format(len(surplus[1]) + len(surplus[-1])))
 # We'll be following a series of edge-disjoint paths in the graph.
 # This is a mapping from a node to a deque of the paths starting
 # at that node, each path being a list of nodes. (The reason for
 # using a deque is so that we can process these paths in the order
 # that they were found.)
 paths = defaultdict(deque)
 # Nodes that have been visited but have unvisited out-neighbours.
 unfinished = set([start])
 while unfinished:
 # Pick any unfinished node to start the current path.
 node = pick_any(unfinished)
 path = [node]
 paths[node].append(path)
 # Keep choosing any neighbour and removing the edge just
 # traversed, until a dead end is reached.
 while graph.out_degree(node):
 neighbour = pick_any(graph.out_neighbours(node))
 graph.remove_edge(node, neighbour)
 if graph.out_degree(node) == 0:
 unfinished.remove(node)
 path.append(neighbour)
 unfinished.add(neighbour)
 node = neighbour
 else:
 unfinished.remove(node)
 # If there are any edges remaining, that means the graph was
 # disconnected.
 if any(graph.degree(node) for node in graph):
 raise NoEulerianPath("Graph is not connected.")
 # Concatenate the individual paths into one continuous path (using
 # the "stack of iterators" pattern).
 path = []
 stack = [iter(paths[start].popleft())]
 while stack:
 for node in stack[-1]:
 if paths[node]:
 stack.append(iter(paths[node].popleft()))
 break
 else:
 path.append(node)
 else:
 stack.pop()
 # Check that we concatenated all the paths.
 assert not any(paths.values())
 return path

(See here for an explanation of the "stack of iterators" pattern.)

Question 4

This is all great guidance! I was confused about parts of it, and I think you've definitely help point me in the right direction. Appreciate the feedback and suggestions!

Gareth Rees Gareth Rees 50.1k3 gold badges130 silver badges210 bronze badges · Accepted Answer · 2017-06-18 10:19:26Z

There's a lot of code here (far too much to review), so I'm going to make some general observations and then demonstrate how I would tackle the problem.

1. Review

I found the code too complicated to understand. There doesn't seem to be any modularity — that is, there doesn't seem to be a way to break the code into parts that can be understood separately.
Although there are docstrings (good), the specifications of the functions are so complex that I simply couldn't follow the logic. Some of the functions have completely different behaviour depending on the values of the arguments: in particular find_eulerian_tour does one thing if path is None, and something completely different otherwise.
The control flow is not well organized. Normally one expects to see control flowing from complex high-level functions down to simple low-level functions, so that you can understand it bit-by-bit by working your way up from the lowest level. But the code in the post is a twisty maze of recursive calls — find_eulerian_tour calls itself recursively, and also creates two threads using the Graph_Traveler class, which call dfs_traversal, which calls itself recursively.
The use of multiple threads seems unwise to me given that you don't have everything working reliably. Multi-threaded programs are hard to debug, so it's best to start by making a single-threaded program that you are confident is correct, and then work on the multi-threaded case later.
It is not clear to me what Graph_Traveler class is supposed to do, or how it returns its results. (Does it in fact return any results?) It is also not clear why you make exactly two instances of this class, or why the two instances share the same queue.

2. Data structure

Choosing the right data structure is one of the most important steps in any program. A good choice of data structure is one that efficiently supports the operations that the program will need to carry out.

The code in the post represents a graph as a list of edges, each edge being a pair of nodes. The trouble with this representation is that the necessary operations are inconvenient. For example, in find_edge_alg you have curr_node and you want to return the neighbouring node with the highest degree. With the list-of-edges representation, you have to examine every edge in the graph to see if it is incident to curr_node, taking time proportional to the number of edges in the graph. Then to determine the neighbour with the highest degree requires further computation in get_next.

If you look at Wikipedia's article Graph (abstract data type) you'll see that there are three data structures commonly used to represent graphs: adjacency lists, adjacency matrices, and incidence matrices. Which of these best supports the operation of finding the neighbouring node with the highest degree? Read the descriptions of the data structures and think about how you would implement the operation in each. It should be clear that adjacency lists are the winner — they have efficient determination of the neighbours of a node, and efficient determination of the degree of a node (which is the length of the list of neighbours).

3. Using a class

When you have a long-lived data structure (like a graph) together with some operations (such as a finding the neighbours of a vertex), it's often a good idea to implement the data structure using a class, and the operations as methods.

In this case we could do something like this, making use of the built-in collections.defaultdict:

from collections import defaultdict
class Graph:
 """A directed graph whose nodes are any hashable objects."""
 def __init__(self, edges=()):
 """Create a directed graph from an iterable of edges."""
 self._nodes = set() # Set of nodes.
 self._out = defaultdict(set) # Map from node to set of out-neighbours.
 self._in = defaultdict(set) # Map from node to set of in-neighbours.
 for m, n in edges:
 self.add_edge(m, n)
 def __iter__(self):
 """Iterating over the graph yields its nodes."""
 return iter(self._nodes)
 def add_edge(self, m, n):
 """Add an edge from m to n."""
 self._nodes.add(m)
 self._nodes.add(n)
 self._out[m].add(n)
 self._in[n].add(m)
 def remove_edge(self, m, n):
 """Remove the edge from m to n."""
 self._out[m].remove(n)
 self._in[n].remove(m)
 def out_neighbours(self, node):
 """Return the set of out-neighbours of a node."""
 return self._out[node]
 def in_degree(self, node):
 """Return the number of edges ending at node."""
 return len(self._in[node])
 def out_degree(self, node):
 """Return the number of edges starting at node."""
 return len(self._out[node])
 def degree(self, node):
 """Return the number of edges incident to node in either direction."""
 return self.in_degree(node) + self.out_degree(node)

Notes:

This data structure can be read, understood, and tested on its own, without having to know anything about Eulerian paths.
The reason for implementing the graph as mappings from node to sets of its in- and out-neighbours is so that the remove_edge method (which we'll need later on) is efficient — if we used lists instead of sets then removing edges would take time proportional to the number of neighbours of each node in the edge.

Now it's easy to find the neighbouring node with the highest degree:

max(graph.neighbours(node), key=graph.degree)

4. Algorithm

It's always worth doing a bit of research to see whether good algorithms are already known for your problem. In this case, Wikipedia's article on Eulerian paths describes two algorithms. Fleury's algorithm sounds complicated and inefficient, so we want to implement Hierholzer's algorithm. Here's how Wikipedia describes it:

Choose any starting vertex \$v\,ドル and follow a trail of edges from that vertex until returning to \$v\$. It is not possible to get stuck at any vertex other than \$v\,ドル because the even degree of all vertices ensures that, when the trail enters another vertex \$w\$ there must be an unused edge leaving \$w\$. The tour formed in this way is a closed tour, but may not cover all the vertices and edges of the initial graph.

As long as there exists a vertex \$u\$ that belongs to the current tour but that has adjacent edges not part of the tour, start another trail from \$u\,ドル following unused edges until returning to \$u\,ドル and join the tour formed in this way to the previous tour.

This description is for the case of an Eulerian cycle — since we want to find an Eulerian path then we have to modify it slightly to handle the case where there are two odd nodes.

5. Implementation

Here's how I'd implement Hierholzer's algorithm:

from collections import deque
def pick_any(iterable):
 """Return any item from iterable, or raise StopIteration if empty."""
 return next(iter(iterable))
class NoEulerianPath(Exception):
 """Exception raised when there is no Eulerian path."""
def eulerian_path(edges):
 """Return an Eulerian path in the directed graph with the given
 iterable of edges, or raise NoEulerianPath if there is no such path.
 """
 graph = Graph(edges)
 # Mapping from surplus of out-edges over in-edges to list of nodes
 # with that surplus.
 surplus = defaultdict(list)
 for node in graph:
 in_degree = graph.in_degree(node)
 out_degree = graph.out_degree(node)
 degree_surplus = out_degree - in_degree
 if abs(degree_surplus) > 1:
 raise NoEulerianPath("Node {} has in-degree {} and out-degree {}."
 .format(node, in_degree, out_degree))
 surplus[degree_surplus].append(node)
 # Find the starting point for the path.
 if len(surplus[1]) == len(surplus[-1]) == 0:
 # Any starting point will do.
 start = pick_any(graph)
 elif len(surplus[1]) == len(surplus[-1]) == 1:
 # Must start at a node with more out- than in-neighbours.
 start = pick_any(surplus[1])
 else:
 raise NoEulerianPath("Graph has {} odd-degree nodes."
 .format(len(surplus[1]) + len(surplus[-1])))
 # We'll be following a series of edge-disjoint paths in the graph.
 # This is a mapping from a node to a deque of the paths starting
 # at that node, each path being a list of nodes. (The reason for
 # using a deque is so that we can process these paths in the order
 # that they were found.)
 paths = defaultdict(deque)
 # Nodes that have been visited but have unvisited out-neighbours.
 unfinished = set([start])
 while unfinished:
 # Pick any unfinished node to start the current path.
 node = pick_any(unfinished)
 path = [node]
 paths[node].append(path)
 # Keep choosing any neighbour and removing the edge just
 # traversed, until a dead end is reached.
 while graph.out_degree(node):
 neighbour = pick_any(graph.out_neighbours(node))
 graph.remove_edge(node, neighbour)
 if graph.out_degree(node) == 0:
 unfinished.remove(node)
 path.append(neighbour)
 unfinished.add(neighbour)
 node = neighbour
 else:
 unfinished.remove(node)
 # If there are any edges remaining, that means the graph was
 # disconnected.
 if any(graph.degree(node) for node in graph):
 raise NoEulerianPath("Graph is not connected.")
 # Concatenate the individual paths into one continuous path (using
 # the "stack of iterators" pattern).
 path = []
 stack = [iter(paths[start].popleft())]
 while stack:
 for node in stack[-1]:
 if paths[node]:
 stack.append(iter(paths[node].popleft()))
 break
 else:
 path.append(node)
 else:
 stack.pop()
 # Check that we concatenated all the paths.
 assert not any(paths.values())
 return path

(See here for an explanation of the "stack of iterators" pattern.)

This is all great guidance! I was confused about parts of it, and I think you've definitely help point me in the right direction. Appreciate the feedback and suggestions!

Stack Exchange Network

Find Eulerian Tour Algorithm - with multithreading and random permutations

1 Answer 1

1. Review

2. Data structure

3. Using a class

4. Algorithm

5. Implementation

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Find Eulerian Tour Algorithm - with multithreading and random permutations

1 Answer 1

1. Review

2. Data structure

3. Using a class

4. Algorithm

5. Implementation

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions