Iterating across multiple columns in Pandas DF and slicing dynamically

Question 1

TLDR: How to iterate across all options of multiple columns in a pandas dataframe without specifying the columns or their values explicitly?

Long Version: I have a pandas dataframe that looks like this, only it has a lot more features or drug dose combinations than are listed here. Instead of just 3 types of features, it could have something like 70...:

> dosage_df
First Score Last Score A_dose B_dose C_dose
22 28 1 40 130
55 11 2 40 130
15 72 3 40 130
42 67 1 90 130
90 74 2 90 130
87 89 3 90 130
14 43 1 40 700
12 61 2 40 700
41 5 3 40 700

Along with my data frame, I also have a python dictionary with the relevant ranges for each feature. The keys are the feature names, and the different values which it can take are the keys:

> dict_of_dose_ranges = {'A_dose': [1, 2, 3], 'B_dose': [40, 90], 'C_dose': [130,700]}

For my purposes, I need to generate a particular combination (say A_dose = 1, B_dose = 90, and C_dose = 700), and based on those settings take the relevant slice out of my dataframe, and do relevant calculations from that smaller subset, and save the results somewhere.

I need to do this for ALL possible combinations of ALL of my features (far more than the 3 which are here, and which will be variable in the future).

In this case, I could easily pop this into SkLearn's Parameter grid, generate the options:

> from sklearn.grid_search import ParameterGrid
> all_options = list(ParameterGrid(dict_of_dose_ranges)) 
> all_options

and get:

[{'A_dose': 1, 'B_dose': 40, 'C_dose': 130},
 {'A_dose': 1, 'B_dose': 40, 'C_dose': 700},
 {'A_dose': 1, 'B_dose': 90, 'C_dose': 130},
 {'A_dose': 1, 'B_dose': 90, 'C_dose': 700},
 {'A_dose': 2, 'B_dose': 40, 'C_dose': 130},
 {'A_dose': 2, 'B_dose': 40, 'C_dose': 700},
 {'A_dose': 2, 'B_dose': 90, 'C_dose': 130},
 {'A_dose': 2, 'B_dose': 90, 'C_dose': 700},
 {'A_dose': 3, 'B_dose': 40, 'C_dose': 130},
 {'A_dose': 3, 'B_dose': 40, 'C_dose': 700},
 {'A_dose': 3, 'B_dose': 90, 'C_dose': 130},
 {'A_dose': 3, 'B_dose': 90, 'C_dose': 700}]

This is where I run into problems:

Problem #1) I can now iterate across all_options, but I'm not sure how to now SELECT out of my dosage_df from each of the dictionary options (i.e. {'A_dose': 1, 'B_dose': 40, 'C_dose': 130}) WITHOUT doing it explicitly.

In the past, I could do something like:

dosage_df[(dosage_df.A_dose == 1) & (dosage_df.B_dose == 40) & (dosage_df.C_dose == 130)]
First Score Last Score A_dose B_dose C_dose
0 22 28 140 130

But now I'm not sure what to put inside the brackets to slice it dynamically...

dosage_df[?????]

Problem #2) When I actually enter in my full dictionary of features with their respective ranges, I get an error because it deems it as having too many options...

from sklearn.grid_search import ParameterGrid
all_options = list(ParameterGrid(dictionary_of_features_and_ranges)) 
all_options
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
<ipython-input-138-7b73d5e248f5> in <module>()
 1 from sklearn.grid_search import ParameterGrid
----> 2 all_options = list(ParameterGrid(dictionary_of_features_and_ranges))
 3 all_options
OverflowError: long int too large to convert to int

I tried a number of alternate approaches including using double while loops, a tree / recursion method from here, another recursion method from here, but it wasn't coming together.... Any help is much appreciated.

Question 2

You can use itertools.product to generate all possible dosage combinations, and DataFrame.query to do the selection:

from itertools import product
for dosage_comb in product(*dict_of_dose_ranges.values()):
 dosage_items = zip(dict_of_dose_ranges.keys(), dosage_comb)
 query_str = ' & '.join('{} == {}'.format(*x) for x in dosage_items)
 sub_df = dosage_df.query(query_str)
 # Do Stuff...

Question 3

What about using the underlying numpy array and some boolean logic to build an array containing only the lines you want ?

dosage_df = pd.DataFrame((np.random.rand(40000,10)*100).astype(np.int))
dict_of_dose_ranges={3:[10,11,12,13,15,20],4:[20,22,23,24]}
#combined_doses will be bool array that will select all the lines that match the wanted combinations of doses
combined_doses=np.ones(dosage_df.shape[0]).astype(np.bool)
for item in dict_of_dose_ranges.items():
 #item[0] is the kind of dose
 #item[1] are the values of that kind of dose
 next_dose=np.zeros(dosage_df.shape[0]).astype(np.bool)
 #we then iterate over the wanted values
 for value in item[1]:
 # we select and "logical or" all lines matching the values
 next_dose|=(dosage_df[item[0]] == value)
 # we "logical and" all the kinds of dose
 combined_doses&=next_dose
print(dosage_df[combined_doses])

Question 4

Were you able to get this to work? I got an error: IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

Question 5

I've edited a bit the code to correct a mistake arr[:,item[0]] and swapped arthmetic for bitwise operators and added example values so that it is copy/pastable. As it is now, it runs fine for me.

Question 6

I think it only works for you b/c the columns of dosage_df and dict_of_dose_ranges has keys which are integers.

Question 7

@Afflatus I fixed it, it should work with all column names now.

root root 34k6 gold badges77 silver badges89 bronze badges · Accepted Answer · 2016-09-07 23:47:49Z

You can use itertools.product to generate all possible dosage combinations, and DataFrame.query to do the selection:

from itertools import product
for dosage_comb in product(*dict_of_dose_ranges.values()):
 dosage_items = zip(dict_of_dose_ranges.keys(), dosage_comb)
 query_str = ' & '.join('{} == {}'.format(*x) for x in dosage_items)
 sub_df = dosage_df.query(query_str)
 # Do Stuff...

CollectivesTM on Stack Overflow

Iterating across multiple columns in Pandas DF and slicing dynamically

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related