2
\$\begingroup\$

I have three lists of nodes.

  • sources
  • sinks
  • pipes

There is a directed weighted graph from sources to pipes to sinks. Sources are only connected to pipes and pipes only to sinks. But sources are not directly connected to sinks. Pipes are zero-sum, meaning that the sum of the weights that come to each pipe from sources is equal to the sum of the edges that go from that pipe to sinks.

I would like to add a number of edges from sinks back to sources so that sinks and sources also become zero-sum. While minimizing the maximum degree of the graph. I've written a sub-optimal solution that I'm posting here for review.

In simpler words: I have a list of sinks and sources. Each sink has a negative number and each source has a positive number so that the sum of all the numbers in the nodes of the graph are zero(no edges so far). I would like to add a number of edges to this graph so that the sum of the weights of the edges going out/in to each node becomes equal to the number on that node.

Here is a sample Code for testing if a graph summarizes another graph:

from functools import reduce
from collections import Counter
source_edges = {
 "a0": {"p0": 1, "p2": 5}, 
 "a1": {"p0": 2}, 
 "a2": {"p1": 3}
}
sink_edges = {
 "b0": {"p0": 1},
 "b1": {"p0": 1, "p1": 1},
 "b2": {"p0": 1, "p1": 2, "p2": 5},
}
res = {
 "a0": {"b0": 1, "b2": 5}, 
 "a1": {"b1": 2}, 
 "a2": {"b2": 3}
}
sink_degs1 = {k: sum(v.values()) for k, v in sink_edges.items()} 
sink_degs2 = dict(reduce(lambda x, y: x + y, (Counter(v) for v in res.values())))
source_degs1 ={k: sum(v.values()) for k, v in res.items()} 
source_degs2 ={k: sum(v.values()) for k, v in source_edges.items()}
if sink_degs1 == sink_degs2 and source_degs1 == source_degs2:
 print('res summerizes the graph')
else:
 print('res does not summerize this graph')

And a visualization of this graph:

graph image

What follows is a sub-optimum solution with less than n-1 edges.

from numpy.random import randint
from collections import defaultdict
import copy
def create_sample(source_count=5000, sink_count=200):
 diff = -1
 while diff < 0:
 sinks = [["b" + str(i), randint(source_count)] for i in range(sink_count)]
 sources = [["a" + str(i), randint(sink_count)] for i in range(source_count)]
 sink_sum = sum([x[1] for x in sinks])
 source_sum = sum([x[1] for x in sources])
 diff = sink_sum - source_sum
 avg_refill = diff // source_count + 1
 weights_match = False
 while not weights_match:
 for i in range(source_count):
 if not diff:
 break
 rnd = randint(avg_refill * 2.5) if diff > 10 * (avg_refill) else diff
 diff -= rnd
 sources[i][1] += rnd
 weights_match = sum([x[1] for x in sources]) == sum([x[1] for x in sinks])
 return sources, sinks
def solve(sources, sinks):
 src = sorted(copy.deepcopy(sources), key=lambda x: x[1])
 snk = sorted(copy.deepcopy(sinks), key=lambda x: x[1])
 res = []
 while snk:
 if src[0][1] > snk[0][1]:
 edge = (src[0][0], *snk[0])
 src[0][1] -= snk[0][1]
 del snk[0]
 elif src[0][1] < snk[0][1]:
 edge = (src[0][0], snk[0][0], src[0][1])
 snk[0][1] -= src[0][1]
 del src[0]
 else:
 edge = (src[0][0], *snk[0])
 del src[0], snk[0]
 res += [edge]
 return res
def test(sources, sinks):
 res = solve(sources, sinks)
 d_sources = defaultdict(int)
 d_sinks = defaultdict(int)
 w_sources = defaultdict(int)
 w_sinks = defaultdict(int)
 for a, b, c in res:
 d_sources[a] += 1
 d_sinks[b] += 1
 w_sources[a] += c
 w_sinks[b] += c
 print("source " + ("is" if dict(sources) == w_sources else "isn't") + " source")
 print("sink " + ("is" if dict(sinks) == w_sinks else "isn't") + " sink")
 print(
 f"source:\n \tdeg_sum = {sum(d_sources.values())}\n\tmax_deg = {max(d_sources.values())}"
 )
 print(
 f"sink:\n \tdeg_sum = {sum(d_sinks.values())}\n\tmax_deg = {max(d_sinks.values())}"
 )

Here is a sample run:

In [1]: %run solver.py
In [2]: test(*create_sample())
source is source
sink is sink
source:
 deg_sum = 5196
 max_deg = 3
sink:
 deg_sum = 5196
 max_deg = 56

Here is an illustration of how it works:

sources: 4,5,3,2
sinks: 2,7,2,2,1
sorted:
 55555|44|44|33|32|2
 77777|77|22|22|22|1
So we have 6 edges.

Here is a comparison between sorted and unsorted solution with this algorithm:

---------------------------------------------
| (1000,1000) |
---------------------------------------------
| criteria | sorted | random order |
| source degree sum | 1991 | 1999 |
| source max degree | 3 | 7 |
| sink degreee sum | 1991 | 1999 |
| sink max degree | 3 | 8 |
---------------------------------------------
---------------------------------------------
| (200,5000) |
---------------------------------------------
| criteria | sorted | random order |
| source degree sum | 5198 | 5198 |
| source max degree | 2 | 3 |
| sink degreee sum | 5198 | 5198 |
| sink max degree | 43 | 54 |
---------------------------------------------

I'm looking for improvements to the algorithm or implementation to better the performance of this solution. It can be assumed that the edges come from normal distribution.

asked Sep 28, 2019 at 13:23
\$\endgroup\$
4
  • \$\begingroup\$ @dfhwze But I already have written a working code. \$\endgroup\$ Commented Sep 28, 2019 at 13:36
  • \$\begingroup\$ @dfhwze I've seen an improvements section in many of the code reviews here, and I considered better performance as a possible improvement. Seems like It was a misunderstanding, I will edit my question nwo. \$\endgroup\$ Commented Sep 28, 2019 at 13:43
  • 1
    \$\begingroup\$ @dfhwze is this better? \$\endgroup\$ Commented Sep 28, 2019 at 13:49
  • 2
    \$\begingroup\$ seems to be on-topic now, good edit \$\endgroup\$ Commented Sep 28, 2019 at 13:51

1 Answer 1

2
\$\begingroup\$

Use f-strings

"b" + str(i)

can be

f'b{i}'

Don't make inner lists

sum([x[1] for x in sinks])

should be

sum(x[1] for x in sinks)

Avoid loop flags

weights_match = False
while not weights_match:
 # ...
 weights_match = sum([x[1] for x in sources]) == sum([x[1] for x in sinks])

should be:

while True:
 # ...
 if sum(x[1] for x in sources) == sum(x[1] for x in sinks):
 break

(C has a do-while, but Python doesn't.)

Replace your lambda

Don't repeat this lambda:

lambda x: x[1]

Create it once as:

from operator import itemgetter
second = itemgetter(1)

Add type hints

def create_sample(source_count=5000, sink_count=200):

becomes, at a guess,

def create_sample(source_count: int = 5000, sink_count: int = 200) ->
 (list, list):

You can get fancier with typed lists, but this is the bare minimum.

answered Sep 28, 2019 at 18:10
\$\endgroup\$
1
  • 2
    \$\begingroup\$ Thank you so much for your great review, I upvoted your answer for now, but I'm hoping to get a more functional and performance-wise review to have as the accepted answer. \$\endgroup\$ Commented Sep 28, 2019 at 18:46

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.