5
\$\begingroup\$

With huge help from Stack Overflow, I have created code which takes possible swap matches with given weights as an input and outputs a set of matches which gives the highest weight with the highest number of swaps.

A swap is a duty being transferred from one person to another. Every person needs to have exactly one duty.

The difference with a weighted graph is the fact that the highest number of swaps has higher priority.

To do this I multiply every swap with a high number and add the weight of that swap for the LpMaximize.

The code selects a set of graph cycles which gives the highest number of total swaps. The cycle length is limited by max_cycle_order. The selected cycles are added to the result array.

Questions

  1. The method of introducing the weights and prioritizing feels cumbersome. How could this be improved?
  2. It feels like formatting the output takes too many steps. How could this be improved?
  3. With a high max_cycle_order and a high number of matches (7+ and 700+) the code takes a long time to run. In my use case at the moment not a problem because I'm expecting to use max_cycle_order between 2 and 5 but I'm interested in any ideas to limit the runtime for a high max_cycle_order.
  4. Any other tips?
import networkx
import pulp
from datetime import datetime
def optimize(matches, max_steps=2):
 start_time = datetime.now()
 # find sum of all weights to be used to prioritize max swaps
 max_weight = sum(weight for (match_id, a, b, weight) in matches) + 1
 graph = networkx.DiGraph()
 for (match_id, a, b, weight) in matches:
 graph.add_edge(a, b)
 # default is one-on-one swaps (max_cycle_order = 2)
 max_cycle_order = 2
 if isinstance(max_steps, int):
 max_cycle_order = max_steps
 if max_steps < 2:
 max_cycle_order = 2
 if max_steps > 6:
 max_cycle_order = 6
 cycles = tuple(networkx.simple_cycles(graph, length_bound=max_cycle_order))
 cycle_names = ['_'.join(c) for c in cycles]
 # Create weight for each cycle to be used by LP solver
 cycle_orders = []
 for c in cycles:
 sum_weights = 0
 for vertex in c:
 for (match_id, v1, v2, weight) in matches:
 if vertex == v2:
 sum_weights += weight
 break
 cycle_orders.append(max_weight * len(c) + sum_weights)
 selectors = pulp.LpVariable.matrix(
 name='cycle', indices=cycle_names, cat=pulp.LpBinary,
 )
 prob = pulp.LpProblem(name='swaps', sense=pulp.LpMaximize)
 prob.setObjective(pulp.lpDot(selectors, cycle_orders))
 for vertex in graph.nodes:
 # create list of all cycles where vertex (request) is used
 group = [
 selector
 for selector, cycle in zip(selectors, cycles)
 if vertex in cycle
 ]
 # maximize total of these cycles to 1 or smaller to not double use a request
 prob.addConstraint(
 name=f'excl_{vertex}',
 constraint=pulp.lpSum(group) <= 1,
 )
 prob.solve()
 assert prob.status == pulp.LpStatusOptimal
 # process selected cycles for output
 print('Selected cycles:')
 swap_count = 0
 result = []
 for cycle, selector in zip(cycles, selectors):
 if selector.value() > 0.5:
 cycle_result = []
 swap_count += len(cycle)
 print(', '.join(cycle))
 for c in range(0, len(cycle)):
 d = c + 1
 if d > len(cycle) - 1:
 d = 0
 for (match_id, r1, r2, weight) in matches:
 if cycle[c] == r1 and cycle[d] == r2:
 cycle_result.append({'id': match_id, 'from': r1, 'to': r2})
 break
 result.append(cycle_result)
 end_time = datetime.now()
 runtime = (end_time - start_time).total_seconds()
 print('Duration: {}'.format(end_time - start_time))
 output = {
 "status": 1,
 "result": result,
 "swapCount": swap_count,
 "maxSteps": max_cycle_order,
 "runtime": runtime
 }
 return output
# test function call
print(optimize([
 ('id1', 'Amy', 'Blake', 2), ('id2', 'Blake', 'Claire', 5), ('id3', 'Claire', 'Drew', 5), ('id4', 'Drew', 'Emma', 7),
 ('id5', 'Emma', 'Flynn', 3), ('id6', 'Flynn', 'Gabi', 4), ('id7', 'Gabi', 'Hana', 5), ('id8', 'Hana', 'Izzy', 3),
 ('id16', 'Izzy', 'Jill', 6), ('id17', 'Jill', 'Amy', 3),
 # one on one
 ('id9', 'Blake', 'Amy', 3), ('id10', 'Claire', 'Blake', 2), ('id11', 'Emma', 'Drew', 5),
 # three point
 ('id12', 'Gabi', 'Emma', 7), ('id13', 'Drew', 'Blake', 7),
 # four point
 ('id14', 'Flynn', 'Claire', 1), ('id15', 'Jill', 'Gabi', 4)
], 4))
Reinderien
70.9k5 gold badges76 silver badges256 bronze badges
asked Apr 10, 2024 at 18:12
\$\endgroup\$
2
  • 1
    \$\begingroup\$ Re. long runtime, have you profiled? You'll need to determine whether the slow part is NetworkX or PuLP. \$\endgroup\$ Commented Apr 10, 2024 at 19:10
  • \$\begingroup\$ On my M2 MacBook Air a set of 730 matches resulting in 39 swaps with max_cycle_order 7 takes 23 seconds. networkx is done after 2. CBC Solver states it took 12 seconds but probably in total it's a bit more. \$\endgroup\$ Commented Apr 10, 2024 at 19:31

3 Answers 3

5
\$\begingroup\$

helpers

This optimize() is Too Long, plus it would benefit from a docstring. To read it one must vertically scroll. It defines nearly three dozen locals. Psychologists are not in complete agreement whether the Gentle Reader can simultaneously hold seven things in short-term memory or perhaps a few more, but past thirty we have definitely exceeded the limit.

Rely on Extract Helper to produce get_cycles(). Break out the for loops as (unit testable) helpers, too. A desirable side effect will be that local variables go out-of-scope when we return from helper, leaving the reader with fewer concerns to keep track of.

loss function

The big thing I was missing, from the zero-length docstring and from the Review Context, was this. Suppose we have six people listed, and they all change their duty. Would it be cheaper to perform three pair-wise swaps, or a pair of three-person swaps?

Describe the business context and spell out the loss function, so maintenance engineers will be less likely to introduce code defects when they make edits in the coming months.

test suite

The print() is very nice and helpful. But it requires one to ignore some solver debugs, and to eyeball a dict of graph edges when deciding if the output is correct or not. An automated test "knows" the right answer and can display a Green bar by itself. Consider hardcoding the expected answer into your test code. Consider doing import unitttest.

answered Apr 10, 2024 at 19:18
\$\endgroup\$
1
  • \$\begingroup\$ Before reading up on docstrings and unittest.. I have not considered a loss function. In the bussinesscontext it would be better to have three pair-wise swaps. I can probably introduce this in the cycle_orders array which is used in the LpMaximize problem. cycle_orders.append(max_weight * len(c) + sum_weights) \$\endgroup\$ Commented Apr 10, 2024 at 19:48
3
\$\begingroup\$

The following suggestions are mostly for style as opposed to performance.

Documentation

As the previous answer mentions, add a docstring to the function describing its purpose, inputs and outputs.

Intent

The 2 if statements are not independent of each other:

 if max_steps < 2:
 max_cycle_order = 2
 if max_steps > 6:
 max_cycle_order = 6

They should be combined in an if/else:

 if max_steps < 2:
 max_cycle_order = 2
 elif max_steps > 6:
 max_cycle_order = 6

This better conveys the intent, and it is more efficient.

Add whitespace around the assignment operator for better readability. Change:

 name=f'excl_{vertex}',

to:

 name = f'excl_{vertex}',

DRY

This code shows up 3 times:

 len(cycle)

Set it to a variable, then use the variable:

 num_cycles = len(cycle)

Simpler

There is no need for the 0 in: for c in range(0, len(cycle)):

It is simpler as:

 for c in range(len(cycle)):

It is simpler to use an f-string. Change:

print('Duration: {}'.format(end_time - start_time))

to:

print(f'Duration: {end_time - start_time}')
answered Apr 10, 2024 at 19:38
\$\endgroup\$
3
\$\begingroup\$

You should move your dict-rendering code out of the main solve function, along with runtime calculation and printing.

I recall that this was intended for JSON, so don't status=1; JSON has booleans so use True. Also, I think it would make more sense to perform an outer try/catch/assign status=False than to either throw-or-return. Otherwise, why is status there?

Don't use datetime.now() for performance timing. Use time.monotonic for long scales like this, or time.perf_counter() for shorter scales if you're measuring pieces of code in a more granular manner.

answered Apr 11, 2024 at 4:15
\$\endgroup\$
1
  • \$\begingroup\$ I planned to use status=0 for any errors, for example status=1 when the solver gets a result but it might not be optimal (future use) and status=2 for an optimal solution. status=0 is set in the catch block of the code which calls the optimize function. \$\endgroup\$ Commented Apr 11, 2024 at 7:22

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.