Finding the maximum distance from the starting node

Question 1

I've tried everything to make my program faster but with 250 nodes my code takes around 9 seconds to print the result, and with 5000 nodes it took around 260 seconds.

Is there a way to make my program faster?

I took the BFS function from here.

The site(connections) should consist of three towns each of which is connected to directly the other one by a road. The distance between two towns A and B is the minimum number of pairs of directly connected towns on the way from A to B. Two towns are connected directly by a road R if there is no other town on R between A and B. The distance of any possible site from the capital city is equal to the sum of the distances from each of the three towns representing the site from the capital city.

Output: the maximum possible distance of a site from the capital city, the number of all sites which are located at the maximum distance from the capital city.

enter image description here

there are two most distant sites from the capital in node 9, the sites are {0, 1, 3} and {2, 3, 5} and their distance from the capital is 10

Input first line (nodes_count, pairs_number, capital) Next all the pairs

Input

Output

10 2

My code

from collections import defaultdict
from queue import Queue
def read_nodes(pairs_number):
 for _ in range(pairs_number):
 yield map(int, input().split()) 
def parse_input(tree):
 nodes_count, pairs_number, capital = map(int, input().split())
 for node1, node2 in read_nodes(pairs_number):
 tree[node1].append(node2)
 tree[node2].append(node1)
 return tree, capital, nodes_count
def traverse_path(fromNode, toNode, nodes):
 def getNeighbours(current, nodes):
 return nodes[current] if current in nodes else []
 def make_path(toNode, graph):
 result = []
 while 'Root' != toNode:
 result.append(toNode)
 toNode = graph[toNode]
 result.reverse()
 return result
 q = Queue()
 q.put(fromNode)
 graph = {fromNode: 'Root'}
 while not q.empty():
 current = q.get()
 for neighbor in getNeighbours(current, nodes):
 if neighbor not in graph:
 graph[neighbor] = current
 q.put(neighbor)
 if current == toNode:
 return make_path(toNode, graph)
 return []
def distant_sites(graph_dic, capital, nodes_count):
 distance_with_connections = defaultdict(list)
 node_distance = {}
 for towns in range(nodes_count):
 towns_distance = len(traverse_path(capital, towns, graph_dic)) - 1
 node_distance[towns] = (towns_distance)
 for town1, neighbours in graph_dic.items():
 for town2 in neighbours:
 if town1 > town2:
 for neighbour_of_town2 in graph_dic[town2]:
 if town2 > neighbour_of_town2:
 if neighbour_of_town2 in neighbours:
 town_connection = (town1, town2, neighbour_of_town2)
 distance = (node_distance[town1]) + (node_distance[town2]) + (node_distance[neighbour_of_town2])
 distance_with_connections[distance].append(town_connection)
 max_distance = max(distance_with_connections)
 return max_distance, len(distance_with_connections[max_distance]) 
if __name__ == '__main__':
 graph_dic, capital, nodes_count = parse_input(defaultdict(list))
 max_distance, number_of_connections = distance_sites(graph_dic, capital, nodes_count)
 print(max_distance, number_of_connections)

Question 2

So you consider all non-capital cities, and for each of them you compute the maximum possible distance to the capital, is it correct?

Question 3

Yes and they should form a connection of three towns , also the number of sites(connections) in the maximum possible distance to the capital

Question 4

I am not quite sure about your algorithm, yet, in general, searching for longest path is NP-hard, which implies that there is no exact polytime algorithm for it unless \$P = NP\$.

Question 5

yeah but I'm also searching for all the possible connections between the towns(three towns), The site(connections) should consist of three towns each of which is connected to directly the other one by a road.

Question 6

and my implementation there takes more time

Question 7

The code does not work because of a typo — there is a call to distance_sites but there is no such function.
There are no docstrings. What do these functions do? How do I call them?

In order to improve the performance, we need to measure it, and in order to do that, it's helpful to be able to make test cases of arbitrary sizes. So let's write a test case generator:

from itertools import product
def test_case(n):
 """Construct a graph with n**2 nodes and O(n**2) triangles, and return
 a tuple (graph, capital, number of nodes).
 """
 graph = defaultdict(list)
 for i, j in product(range(n), repeat=2):
 k = i * n + j
 if i < n - 1:
 graph[k].append(k + n)
 graph[k + n].append(k)
 if j < n - 1:
 graph[k].append(k + 1)
 graph[k + 1].append(k)
 if i < n - 1 and j < n - 1:
 graph[k].append(k + n + 1)
 graph[k + n + 1].append(k)
 return graph, 0, n * n

Then we can easily measure the performance of the code using timeit.timeit:

>>> from timeit import timeit
>>> timeit(lambda:distant_sites(*test_case(50)), number=1)
51.7171316598542

with \$n=50\$ the graph has 2,500 nodes and 4,802 triangles.

queue.Queue is a thread-safe data structure intended for use by multi-threaded programs. It has to take and release a lock for every operation, so it is overkill to use this in a single-threaded program. It is more than ten times faster to use collections.deque instead:
```
>>> timeit(lambda:distant_sites(*test_case(50)), number=1)
4.451224883086979
```

The code computes the distance from the capital to each town by running a separate breadth-first search for each town. But this repeats a lot of work: in the course of finding the distance to town A, the breadth-first search will have to visit towns B, C, D, and so on. It would make sense to remember the distance to each town as we visit it, and so compute the distances from the capital to all the towns in one go:

from collections import deque
def distances(graph, origin):
 """Return a dictionary mapping each node in graph to its distance from
 the origin.
 """
 result = {origin: 0}
 visited = set([origin])
 queue = deque([origin])
 while queue:
 node = queue.popleft()
 distance = result[node] + 1
 for neighbour in graph[node]:
 if neighbour not in visited:
 result[neighbour] = distance
 visited.add(neighbour)
 queue.append(neighbour)
 return result

This gives a couple of orders of magnitude speedup on the test case:

>>> timeit(lambda:distant_sites(*test_case(50)), number=1)
0.030185375828295946

The code for finding the most distant sites looks at all triangles. But this likely involves a lot of wasted effort. For example, suppose we find a triangle whose nodes have distance 48, 49 and 50 from the origin (with sum 147). Now there is no need to look at any triangle unless it contains a node with distance \${147\over 3} = 49\$ or more. So if we sort the nodes in reverse order by their distance from the origin and just remember the best-scoring site, then we may not have to consider very many sites before we know that we have found the most distant one.

def distant_sites(graph, origin):
 """Return the pair (max_site_dist, site_count), where max_site_dist is
 the maximum distance of any site in the graph from the origin, and
 site_count is the number of sites at that distance. A "site" is a
 triangle of nodes, and its distance from the origin is the sum of
 the distances of the three nodes.
 """
 distance = distances(graph, origin)
 nodes = sorted(((d, n) for n, d in distance.items()), reverse=True)
 max_site_dist = 0
 site_count = 0
 for dist1, node1 in nodes:
 if dist1 * 3 < max_site_dist:
 break
 neighbours = graph[node1]
 for node2 in neighbours:
 dist2 = distance[node2]
 if (dist2, node2) >= (dist1, node1):
 continue
 for node3 in graph[node2]:
 if node3 not in neighbours:
 continue
 dist3 = distance[node3]
 if (dist3, node3) >= (dist2, node2):
 continue
 site_dist = dist1 + dist2 + dist3
 if site_dist > max_site_dist:
 max_site_dist = site_dist
 site_count = 1
 elif site_dist == max_site_dist:
 site_count += 1
 return max_site_dist, site_count

The speedup you get from this optimization depends on the kinds of graph you feed it (if there are few sites then it won't make much difference). For my test case we get about 40% speedup:

>>> timeit(lambda:distant_sites(*test_case(50)), number=1)
0.017997111193835735

Gareth Rees Gareth Rees 50.1k3 gold badges130 silver badges210 bronze badges · Accepted Answer · 2016-05-23 10:25:18Z

The code does not work because of a typo — there is a call to distance_sites but there is no such function.
There are no docstrings. What do these functions do? How do I call them?

In order to improve the performance, we need to measure it, and in order to do that, it's helpful to be able to make test cases of arbitrary sizes. So let's write a test case generator:

from itertools import product
def test_case(n):
 """Construct a graph with n**2 nodes and O(n**2) triangles, and return
 a tuple (graph, capital, number of nodes).
 """
 graph = defaultdict(list)
 for i, j in product(range(n), repeat=2):
 k = i * n + j
 if i < n - 1:
 graph[k].append(k + n)
 graph[k + n].append(k)
 if j < n - 1:
 graph[k].append(k + 1)
 graph[k + 1].append(k)
 if i < n - 1 and j < n - 1:
 graph[k].append(k + n + 1)
 graph[k + n + 1].append(k)
 return graph, 0, n * n

Then we can easily measure the performance of the code using timeit.timeit:

>>> from timeit import timeit
>>> timeit(lambda:distant_sites(*test_case(50)), number=1)
51.7171316598542

with \$n=50\$ the graph has 2,500 nodes and 4,802 triangles.

queue.Queue is a thread-safe data structure intended for use by multi-threaded programs. It has to take and release a lock for every operation, so it is overkill to use this in a single-threaded program. It is more than ten times faster to use collections.deque instead:
```
>>> timeit(lambda:distant_sites(*test_case(50)), number=1)
4.451224883086979
```

The code computes the distance from the capital to each town by running a separate breadth-first search for each town. But this repeats a lot of work: in the course of finding the distance to town A, the breadth-first search will have to visit towns B, C, D, and so on. It would make sense to remember the distance to each town as we visit it, and so compute the distances from the capital to all the towns in one go:

from collections import deque
def distances(graph, origin):
 """Return a dictionary mapping each node in graph to its distance from
 the origin.
 """
 result = {origin: 0}
 visited = set([origin])
 queue = deque([origin])
 while queue:
 node = queue.popleft()
 distance = result[node] + 1
 for neighbour in graph[node]:
 if neighbour not in visited:
 result[neighbour] = distance
 visited.add(neighbour)
 queue.append(neighbour)
 return result

This gives a couple of orders of magnitude speedup on the test case:

>>> timeit(lambda:distant_sites(*test_case(50)), number=1)
0.030185375828295946

The code for finding the most distant sites looks at all triangles. But this likely involves a lot of wasted effort. For example, suppose we find a triangle whose nodes have distance 48, 49 and 50 from the origin (with sum 147). Now there is no need to look at any triangle unless it contains a node with distance \${147\over 3} = 49\$ or more. So if we sort the nodes in reverse order by their distance from the origin and just remember the best-scoring site, then we may not have to consider very many sites before we know that we have found the most distant one.

def distant_sites(graph, origin):
 """Return the pair (max_site_dist, site_count), where max_site_dist is
 the maximum distance of any site in the graph from the origin, and
 site_count is the number of sites at that distance. A "site" is a
 triangle of nodes, and its distance from the origin is the sum of
 the distances of the three nodes.
 """
 distance = distances(graph, origin)
 nodes = sorted(((d, n) for n, d in distance.items()), reverse=True)
 max_site_dist = 0
 site_count = 0
 for dist1, node1 in nodes:
 if dist1 * 3 < max_site_dist:
 break
 neighbours = graph[node1]
 for node2 in neighbours:
 dist2 = distance[node2]
 if (dist2, node2) >= (dist1, node1):
 continue
 for node3 in graph[node2]:
 if node3 not in neighbours:
 continue
 dist3 = distance[node3]
 if (dist3, node3) >= (dist2, node2):
 continue
 site_dist = dist1 + dist2 + dist3
 if site_dist > max_site_dist:
 max_site_dist = site_dist
 site_count = 1
 elif site_dist == max_site_dist:
 site_count += 1
 return max_site_dist, site_count

The speedup you get from this optimization depends on the kinds of graph you feed it (if there are few sites then it won't make much difference). For my test case we get about 40% speedup:

>>> timeit(lambda:distant_sites(*test_case(50)), number=1)
0.017997111193835735

Stack Exchange Network

Finding the maximum distance from the starting node

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Finding the maximum distance from the starting node

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions