I am working on an optimization project where I have a series of dictionaries with tuples as keys and another dictionary (a decision variable with Gurobi) where the key is the first element of the tuples in the other dictionaries. I need to be able to do the following:
data1 = {(place, person): q}
data2 = {person: s}
x = {place: var}
qx = {k: x[k]*data1[k] for k in x}
total1 = {}
for key, value in qx.items():
person = key[1]
if person in total1:
total1[person] = total1[person] + value
else:
total1[person] = value
total2 = {k: total1[k]/data2[k] for k in total1}
(Please note that the data1, data2, and x dictionaries are very large, 10,000+ distinct place/person pairs).
This same process works when I use the raw data in place of the decision variable, which uses the same (place, person) key. Unfortunately, my variable within the Gurobi model itself must be a dictionary and it cannot contain the person key value.
Is there any way to iterate over just the first value in the tuple key?
EDIT: Here are some sample values (sensitive data, so placeholder values):
data1 = {(1, a): 28, (1, c): 57, (2, b): 125}
data2 = {a: 7.8, b: 8.5, c: 8.4}
x = {1: 0.002, 2: 0.013}
Values in data1 are all integers, data2 are hours, and x are small decimals.
Outputs in total2 should look similar to the following (assuming there are many other rows for each person):
total2 = {a: 0.85, b: 1.2, c: 1.01}
This code is essentially calculating a "productivity score" for each person. The decision variable, x, is looking only at each individual place for business purposes, so it cannot include the person identifiers. Also, the Gurobi package is very limiting about how things can be formatted, so I have not found a way to even use the tuple key for x.
2 Answers 2
Generally, the most efficient way to aggregate values into bins is to use a for
loop and store the values in a dictionary, as you did with total1
in your example. In the code below, I have fixed your qx
line so it runs, but I don't know if this matches your intention. I also used total1.setdefault
to streamline the code a little:
a, b, c = 'a', 'b', 'c'
data1 = {(1, a): 28, (1, c): 57, (2, b): 125}
data2 = {a: 7.8, b: 8.5, c: 8.4}
x = {1: 0.002, 2: 0.013}
qx = {place, person: x[place] * value for (place, person), value in data1.items()}
total1 = {}
for (place, person), value in qx.items():
total1.setdefault(person, 0.0)
total1[person] += value
total2 = {k: total1[k] / data2[k] for k in total1}
print(total2)
# {'a': 0.0071794871794871795, 'c': 0.013571428571428571, 'b': 0.19117647058823528}
But this doesn't produce the result you asked for. I can't tell at a glance how you get the result you showed, but this may help you move in the right direction.
It might also be easier to read if you moved the qx
logic into the loop, like this:
total1 = {}
for (place, person), value in data1.items():
total1.setdefault(person, 0.0)
total1[person] += x[place] * value
total2 = {k: total1[k] / data2[k] for k in total1}
Or, if you want to do this often, it might be worth creating a cross-reference between persons and their matching places, as @martijn-pieters suggested (note, you still need a for
loop to do the initial cross-referencing):
# create a list of valid places for each person
places_for_person = {}
for place, person in data1:
places_for_person.setdefault(person, [])
places_for_person[person].append(place)
# now do the calculation
total2 = {
person:
sum(
data1[place, person] * x[place]
for place in places_for_person[person]
) / data2[person]
for person in data2
}
-
Thanks! The streamlined bits are super helpful. At this point, I still can't get total1 to work with x, so I have a feeling it is more to do with the way Gurobi processes things rather than the straight up Python syntax.Kerry– Kerry12/14/2017 16:44:01Commented Dec 14, 2017 at 16:44
-
Glad to help! If you do this a lot, you might also try
total1=collections.defaultdict(list)
for the first example ortotal1=collections.defaultdict(float)
for the second example (instead of using a standarddict
). Then you don't even need thesetdefault
part.Matthias Fripp– Matthias Fripp12/14/2017 19:25:24Commented Dec 14, 2017 at 19:25 -
By the way, if you're doing a lot of optimization in Python, you might consider using the Pyomo package. It's a robust, mature package and allows you to use pretty much any solver without changing your code (gurobi, cplex, glpk, cbc, etc.). Also has some nice extensions for stochastic programming.Matthias Fripp– Matthias Fripp12/14/2017 19:33:42Commented Dec 14, 2017 at 19:33
For creating a new dictionary removing the tuple:
a, b, c = "a", "b", "c"
data1 = {(1, a): 28, (1, c): 57, (2, b): 125}
total = list()
spot = 0
for a in data1:
total.append(list(a[1])) # Add new Lists to list "total" containing the Key values
total[spot].append(data1[a]) # Add Values to Keys judging from their spot in the list
spot += 1 # to keep the spot in correct place in lists
total = dict(total) # convert it to dictionary
print(total)
Output:
{'a': 28, 'c': 57, 'b': 125}
-
it doesn't seem very optimised. The author said the dataset is huge.Marine Galantin– Marine Galantin06/07/2020 22:17:38Commented Jun 7, 2020 at 22:17
total2
?