2

I am working on an optimization project where I have a series of dictionaries with tuples as keys and another dictionary (a decision variable with Gurobi) where the key is the first element of the tuples in the other dictionaries. I need to be able to do the following:

data1 = {(place, person): q}
data2 = {person: s}
x = {place: var}
qx = {k: x[k]*data1[k] for k in x}
total1 = {}
for key, value in qx.items():
 person = key[1]
 if person in total1:
 total1[person] = total1[person] + value
 else:
 total1[person] = value
total2 = {k: total1[k]/data2[k] for k in total1}

(Please note that the data1, data2, and x dictionaries are very large, 10,000+ distinct place/person pairs).

This same process works when I use the raw data in place of the decision variable, which uses the same (place, person) key. Unfortunately, my variable within the Gurobi model itself must be a dictionary and it cannot contain the person key value.

Is there any way to iterate over just the first value in the tuple key?

EDIT: Here are some sample values (sensitive data, so placeholder values):

data1 = {(1, a): 28, (1, c): 57, (2, b): 125}
data2 = {a: 7.8, b: 8.5, c: 8.4}
x = {1: 0.002, 2: 0.013}

Values in data1 are all integers, data2 are hours, and x are small decimals.

Outputs in total2 should look similar to the following (assuming there are many other rows for each person):

total2 = {a: 0.85, b: 1.2, c: 1.01}

This code is essentially calculating a "productivity score" for each person. The decision variable, x, is looking only at each individual place for business purposes, so it cannot include the person identifiers. Also, the Gurobi package is very limiting about how things can be formatted, so I have not found a way to even use the tuple key for x.

asked Dec 13, 2017 at 19:29
3
  • 5
    Please provide some short input and output data samples. Commented Dec 13, 2017 at 19:32
  • And how do you arrive to those numbers in total2? Commented Dec 13, 2017 at 20:42
  • Sorry, I didn't do the actual math for the sample data. Those are results that I would get using the entire set of data, which includes upwards of 100 rows for each person at each place. Commented Dec 13, 2017 at 21:03

2 Answers 2

4

Generally, the most efficient way to aggregate values into bins is to use a for loop and store the values in a dictionary, as you did with total1 in your example. In the code below, I have fixed your qx line so it runs, but I don't know if this matches your intention. I also used total1.setdefault to streamline the code a little:

a, b, c = 'a', 'b', 'c'
data1 = {(1, a): 28, (1, c): 57, (2, b): 125}
data2 = {a: 7.8, b: 8.5, c: 8.4}
x = {1: 0.002, 2: 0.013}
qx = {place, person: x[place] * value for (place, person), value in data1.items()}
total1 = {}
for (place, person), value in qx.items():
 total1.setdefault(person, 0.0)
 total1[person] += value
total2 = {k: total1[k] / data2[k] for k in total1}
print(total2)
# {'a': 0.0071794871794871795, 'c': 0.013571428571428571, 'b': 0.19117647058823528}

But this doesn't produce the result you asked for. I can't tell at a glance how you get the result you showed, but this may help you move in the right direction.

It might also be easier to read if you moved the qx logic into the loop, like this:

total1 = {}
for (place, person), value in data1.items():
 total1.setdefault(person, 0.0)
 total1[person] += x[place] * value
total2 = {k: total1[k] / data2[k] for k in total1}

Or, if you want to do this often, it might be worth creating a cross-reference between persons and their matching places, as @martijn-pieters suggested (note, you still need a for loop to do the initial cross-referencing):

# create a list of valid places for each person
places_for_person = {}
for place, person in data1:
 places_for_person.setdefault(person, [])
 places_for_person[person].append(place)
# now do the calculation
total2 = {
 person: 
 sum(
 data1[place, person] * x[place]
 for place in places_for_person[person]
 ) / data2[person]
 for person in data2
}
answered Dec 13, 2017 at 20:07
3
  • Thanks! The streamlined bits are super helpful. At this point, I still can't get total1 to work with x, so I have a feeling it is more to do with the way Gurobi processes things rather than the straight up Python syntax. Commented Dec 14, 2017 at 16:44
  • Glad to help! If you do this a lot, you might also try total1=collections.defaultdict(list) for the first example or total1=collections.defaultdict(float) for the second example (instead of using a standard dict). Then you don't even need the setdefault part. Commented Dec 14, 2017 at 19:25
  • By the way, if you're doing a lot of optimization in Python, you might consider using the Pyomo package. It's a robust, mature package and allows you to use pretty much any solver without changing your code (gurobi, cplex, glpk, cbc, etc.). Also has some nice extensions for stochastic programming. Commented Dec 14, 2017 at 19:33
0

For creating a new dictionary removing the tuple:

a, b, c = "a", "b", "c"
data1 = {(1, a): 28, (1, c): 57, (2, b): 125}
total = list()
spot = 0 
for a in data1:
 total.append(list(a[1])) # Add new Lists to list "total" containing the Key values
 total[spot].append(data1[a]) # Add Values to Keys judging from their spot in the list
 spot += 1 # to keep the spot in correct place in lists
total = dict(total) # convert it to dictionary
print(total)

Output:

{'a': 28, 'c': 57, 'b': 125}
answered Dec 13, 2017 at 21:12
1
  • it doesn't seem very optimised. The author said the dataset is huge. Commented Jun 7, 2020 at 22:17

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.