Finding minimum weighted matching in sink source graph

Question 1

I have three lists of nodes.

sources
sinks
pipes

There is a directed weighted graph from sources to pipes to sinks. Sources are only connected to pipes and pipes only to sinks. But sources are not directly connected to sinks. Pipes are zero-sum, meaning that the sum of the weights that come to each pipe from sources is equal to the sum of the edges that go from that pipe to sinks.

I would like to add a number of edges from sinks back to sources so that sinks and sources also become zero-sum. While minimizing the maximum degree of the graph. I've written a sub-optimal solution that I'm posting here for review.

In simpler words: I have a list of sinks and sources. Each sink has a negative number and each source has a positive number so that the sum of all the numbers in the nodes of the graph are zero(no edges so far). I would like to add a number of edges to this graph so that the sum of the weights of the edges going out/in to each node becomes equal to the number on that node.

Here is a sample Code for testing if a graph summarizes another graph:

from functools import reduce
from collections import Counter
source_edges = {
 "a0": {"p0": 1, "p2": 5}, 
 "a1": {"p0": 2}, 
 "a2": {"p1": 3}
}
sink_edges = {
 "b0": {"p0": 1},
 "b1": {"p0": 1, "p1": 1},
 "b2": {"p0": 1, "p1": 2, "p2": 5},
}
res = {
 "a0": {"b0": 1, "b2": 5}, 
 "a1": {"b1": 2}, 
 "a2": {"b2": 3}
}
sink_degs1 = {k: sum(v.values()) for k, v in sink_edges.items()} 
sink_degs2 = dict(reduce(lambda x, y: x + y, (Counter(v) for v in res.values())))
source_degs1 ={k: sum(v.values()) for k, v in res.items()} 
source_degs2 ={k: sum(v.values()) for k, v in source_edges.items()}
if sink_degs1 == sink_degs2 and source_degs1 == source_degs2:
 print('res summerizes the graph')
else:
 print('res does not summerize this graph')

And a visualization of this graph:

graph image

What follows is a sub-optimum solution with less than n-1 edges.

from numpy.random import randint
from collections import defaultdict
import copy
def create_sample(source_count=5000, sink_count=200):
 diff = -1
 while diff < 0:
 sinks = [["b" + str(i), randint(source_count)] for i in range(sink_count)]
 sources = [["a" + str(i), randint(sink_count)] for i in range(source_count)]
 sink_sum = sum([x[1] for x in sinks])
 source_sum = sum([x[1] for x in sources])
 diff = sink_sum - source_sum
 avg_refill = diff // source_count + 1
 weights_match = False
 while not weights_match:
 for i in range(source_count):
 if not diff:
 break
 rnd = randint(avg_refill * 2.5) if diff > 10 * (avg_refill) else diff
 diff -= rnd
 sources[i][1] += rnd
 weights_match = sum([x[1] for x in sources]) == sum([x[1] for x in sinks])
 return sources, sinks
def solve(sources, sinks):
 src = sorted(copy.deepcopy(sources), key=lambda x: x[1])
 snk = sorted(copy.deepcopy(sinks), key=lambda x: x[1])
 res = []
 while snk:
 if src[0][1] > snk[0][1]:
 edge = (src[0][0], *snk[0])
 src[0][1] -= snk[0][1]
 del snk[0]
 elif src[0][1] < snk[0][1]:
 edge = (src[0][0], snk[0][0], src[0][1])
 snk[0][1] -= src[0][1]
 del src[0]
 else:
 edge = (src[0][0], *snk[0])
 del src[0], snk[0]
 res += [edge]
 return res
def test(sources, sinks):
 res = solve(sources, sinks)
 d_sources = defaultdict(int)
 d_sinks = defaultdict(int)
 w_sources = defaultdict(int)
 w_sinks = defaultdict(int)
 for a, b, c in res:
 d_sources[a] += 1
 d_sinks[b] += 1
 w_sources[a] += c
 w_sinks[b] += c
 print("source " + ("is" if dict(sources) == w_sources else "isn't") + " source")
 print("sink " + ("is" if dict(sinks) == w_sinks else "isn't") + " sink")
 print(
 f"source:\n \tdeg_sum = {sum(d_sources.values())}\n\tmax_deg = {max(d_sources.values())}"
 )
 print(
 f"sink:\n \tdeg_sum = {sum(d_sinks.values())}\n\tmax_deg = {max(d_sinks.values())}"
 )

Here is a sample run:

In [1]: %run solver.py
In [2]: test(*create_sample())
source is source
sink is sink
source:
 deg_sum = 5196
 max_deg = 3
sink:
 deg_sum = 5196
 max_deg = 56

Here is an illustration of how it works:

sources: 4,5,3,2
sinks: 2,7,2,2,1
sorted:
 55555|44|44|33|32|2
 77777|77|22|22|22|1
So we have 6 edges.

Here is a comparison between sorted and unsorted solution with this algorithm:

---------------------------------------------
| (1000,1000) |
---------------------------------------------
| criteria | sorted | random order |
| source degree sum | 1991 | 1999 |
| source max degree | 3 | 7 |
| sink degreee sum | 1991 | 1999 |
| sink max degree | 3 | 8 |
---------------------------------------------
---------------------------------------------
| (200,5000) |
---------------------------------------------
| criteria | sorted | random order |
| source degree sum | 5198 | 5198 |
| source max degree | 2 | 3 |
| sink degreee sum | 5198 | 5198 |
| sink max degree | 43 | 54 |
---------------------------------------------

I'm looking for improvements to the algorithm or implementation to better the performance of this solution. It can be assumed that the edges come from normal distribution.

Question 2

@dfhwze But I already have written a working code.

Question 3

@dfhwze I've seen an improvements section in many of the code reviews here, and I considered better performance as a possible improvement. Seems like It was a misunderstanding, I will edit my question nwo.

Question 4

@dfhwze is this better?

Question 5

seems to be on-topic now, good edit

Question 6

Use f-strings

"b" + str(i)

can be

f'b{i}'

Don't make inner lists

sum([x[1] for x in sinks])

should be

sum(x[1] for x in sinks)

Avoid loop flags

weights_match = False
while not weights_match:
 # ...
 weights_match = sum([x[1] for x in sources]) == sum([x[1] for x in sinks])

should be:

while True:
 # ...
 if sum(x[1] for x in sources) == sum(x[1] for x in sinks):
 break

(C has a do-while, but Python doesn't.)

Replace your lambda

Don't repeat this lambda:

lambda x: x[1]

Create it once as:

from operator import itemgetter
second = itemgetter(1)

Add type hints

def create_sample(source_count=5000, sink_count=200):

becomes, at a guess,

def create_sample(source_count: int = 5000, sink_count: int = 200) ->
 (list, list):

You can get fancier with typed lists, but this is the bare minimum.

Question 7

Thank you so much for your great review, I upvoted your answer for now, but I'm hoping to get a more functional and performance-wise review to have as the accepted answer.

Reinderien Reinderien 70.9k5 gold badges76 silver badges256 bronze badges · Accepted Answer · 2019-09-28 18:10:40Z

Use f-strings

"b" + str(i)

can be

f'b{i}'

Don't make inner lists

sum([x[1] for x in sinks])

should be

sum(x[1] for x in sinks)

Avoid loop flags

weights_match = False
while not weights_match:
 # ...
 weights_match = sum([x[1] for x in sources]) == sum([x[1] for x in sinks])

should be:

while True:
 # ...
 if sum(x[1] for x in sources) == sum(x[1] for x in sinks):
 break

(C has a do-while, but Python doesn't.)

Replace your lambda

Don't repeat this lambda:

lambda x: x[1]

Create it once as:

from operator import itemgetter
second = itemgetter(1)

Add type hints

def create_sample(source_count=5000, sink_count=200):

becomes, at a guess,

def create_sample(source_count: int = 5000, sink_count: int = 200) ->
 (list, list):

You can get fancier with typed lists, but this is the bare minimum.

Thank you so much for your great review, I upvoted your answer for now, but I'm hoping to get a more functional and performance-wise review to have as the accepted answer.

Stack Exchange Network

Finding minimum weighted matching in sink source graph

1 Answer 1

Use f-strings

Don't make inner lists

Avoid loop flags

Replace your lambda

Add type hints

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Finding minimum weighted matching in sink source graph

1 Answer 1

Use f-strings

Don't make inner lists

Avoid loop flags

Replace your lambda

Add type hints

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions