I have three lists of nodes.
- sources
- sinks
- pipes
There is a directed weighted graph from sources to pipes to sinks. Sources are only connected to pipes and pipes only to sinks. But sources are not directly connected to sinks. Pipes are zero-sum, meaning that the sum of the weights that come to each pipe from sources is equal to the sum of the edges that go from that pipe to sinks.
I would like to add a number of edges from sinks back to sources so that sinks and sources also become zero-sum. While minimizing the maximum degree of the graph. I've written a sub-optimal solution that I'm posting here for review.
In simpler words: I have a list of sinks and sources. Each sink has a negative number and each source has a positive number so that the sum of all the numbers in the nodes of the graph are zero(no edges so far). I would like to add a number of edges to this graph so that the sum of the weights of the edges going out/in to each node becomes equal to the number on that node.
Here is a sample Code for testing if a graph summarizes another graph:
from functools import reduce
from collections import Counter
source_edges = {
"a0": {"p0": 1, "p2": 5},
"a1": {"p0": 2},
"a2": {"p1": 3}
}
sink_edges = {
"b0": {"p0": 1},
"b1": {"p0": 1, "p1": 1},
"b2": {"p0": 1, "p1": 2, "p2": 5},
}
res = {
"a0": {"b0": 1, "b2": 5},
"a1": {"b1": 2},
"a2": {"b2": 3}
}
sink_degs1 = {k: sum(v.values()) for k, v in sink_edges.items()}
sink_degs2 = dict(reduce(lambda x, y: x + y, (Counter(v) for v in res.values())))
source_degs1 ={k: sum(v.values()) for k, v in res.items()}
source_degs2 ={k: sum(v.values()) for k, v in source_edges.items()}
if sink_degs1 == sink_degs2 and source_degs1 == source_degs2:
print('res summerizes the graph')
else:
print('res does not summerize this graph')
And a visualization of this graph:
What follows is a sub-optimum solution with less than n-1 edges.
from numpy.random import randint
from collections import defaultdict
import copy
def create_sample(source_count=5000, sink_count=200):
diff = -1
while diff < 0:
sinks = [["b" + str(i), randint(source_count)] for i in range(sink_count)]
sources = [["a" + str(i), randint(sink_count)] for i in range(source_count)]
sink_sum = sum([x[1] for x in sinks])
source_sum = sum([x[1] for x in sources])
diff = sink_sum - source_sum
avg_refill = diff // source_count + 1
weights_match = False
while not weights_match:
for i in range(source_count):
if not diff:
break
rnd = randint(avg_refill * 2.5) if diff > 10 * (avg_refill) else diff
diff -= rnd
sources[i][1] += rnd
weights_match = sum([x[1] for x in sources]) == sum([x[1] for x in sinks])
return sources, sinks
def solve(sources, sinks):
src = sorted(copy.deepcopy(sources), key=lambda x: x[1])
snk = sorted(copy.deepcopy(sinks), key=lambda x: x[1])
res = []
while snk:
if src[0][1] > snk[0][1]:
edge = (src[0][0], *snk[0])
src[0][1] -= snk[0][1]
del snk[0]
elif src[0][1] < snk[0][1]:
edge = (src[0][0], snk[0][0], src[0][1])
snk[0][1] -= src[0][1]
del src[0]
else:
edge = (src[0][0], *snk[0])
del src[0], snk[0]
res += [edge]
return res
def test(sources, sinks):
res = solve(sources, sinks)
d_sources = defaultdict(int)
d_sinks = defaultdict(int)
w_sources = defaultdict(int)
w_sinks = defaultdict(int)
for a, b, c in res:
d_sources[a] += 1
d_sinks[b] += 1
w_sources[a] += c
w_sinks[b] += c
print("source " + ("is" if dict(sources) == w_sources else "isn't") + " source")
print("sink " + ("is" if dict(sinks) == w_sinks else "isn't") + " sink")
print(
f"source:\n \tdeg_sum = {sum(d_sources.values())}\n\tmax_deg = {max(d_sources.values())}"
)
print(
f"sink:\n \tdeg_sum = {sum(d_sinks.values())}\n\tmax_deg = {max(d_sinks.values())}"
)
Here is a sample run:
In [1]: %run solver.py
In [2]: test(*create_sample())
source is source
sink is sink
source:
deg_sum = 5196
max_deg = 3
sink:
deg_sum = 5196
max_deg = 56
Here is an illustration of how it works:
sources: 4,5,3,2
sinks: 2,7,2,2,1
sorted:
55555|44|44|33|32|2
77777|77|22|22|22|1
So we have 6 edges.
Here is a comparison between sorted and unsorted solution with this algorithm:
---------------------------------------------
| (1000,1000) |
---------------------------------------------
| criteria | sorted | random order |
| source degree sum | 1991 | 1999 |
| source max degree | 3 | 7 |
| sink degreee sum | 1991 | 1999 |
| sink max degree | 3 | 8 |
---------------------------------------------
---------------------------------------------
| (200,5000) |
---------------------------------------------
| criteria | sorted | random order |
| source degree sum | 5198 | 5198 |
| source max degree | 2 | 3 |
| sink degreee sum | 5198 | 5198 |
| sink max degree | 43 | 54 |
---------------------------------------------
I'm looking for improvements to the algorithm or implementation to better the performance of this solution. It can be assumed that the edges come from normal distribution.
-
\$\begingroup\$ @dfhwze But I already have written a working code. \$\endgroup\$yukashima huksay– yukashima huksay2019年09月28日 13:36:42 +00:00Commented Sep 28, 2019 at 13:36
-
\$\begingroup\$ @dfhwze I've seen an improvements section in many of the code reviews here, and I considered better performance as a possible improvement. Seems like It was a misunderstanding, I will edit my question nwo. \$\endgroup\$yukashima huksay– yukashima huksay2019年09月28日 13:43:07 +00:00Commented Sep 28, 2019 at 13:43
-
1\$\begingroup\$ @dfhwze is this better? \$\endgroup\$yukashima huksay– yukashima huksay2019年09月28日 13:49:34 +00:00Commented Sep 28, 2019 at 13:49
-
2\$\begingroup\$ seems to be on-topic now, good edit \$\endgroup\$dfhwze– dfhwze2019年09月28日 13:51:54 +00:00Commented Sep 28, 2019 at 13:51
1 Answer 1
Use f-strings
"b" + str(i)
can be
f'b{i}'
Don't make inner lists
sum([x[1] for x in sinks])
should be
sum(x[1] for x in sinks)
Avoid loop flags
weights_match = False
while not weights_match:
# ...
weights_match = sum([x[1] for x in sources]) == sum([x[1] for x in sinks])
should be:
while True:
# ...
if sum(x[1] for x in sources) == sum(x[1] for x in sinks):
break
(C has a do-while, but Python doesn't.)
Replace your lambda
Don't repeat this lambda:
lambda x: x[1]
Create it once as:
from operator import itemgetter
second = itemgetter(1)
Add type hints
def create_sample(source_count=5000, sink_count=200):
becomes, at a guess,
def create_sample(source_count: int = 5000, sink_count: int = 200) ->
(list, list):
You can get fancier with typed lists, but this is the bare minimum.
-
2\$\begingroup\$ Thank you so much for your great review, I upvoted your answer for now, but I'm hoping to get a more functional and performance-wise review to have as the accepted answer. \$\endgroup\$yukashima huksay– yukashima huksay2019年09月28日 18:46:24 +00:00Commented Sep 28, 2019 at 18:46
Explore related questions
See similar questions with these tags.