61

I find myself often having to check whether a column or row exists in a dataframe before trying to reference it. For example I end up adding a lot of code like:

if 'mycol' in df.columns and 'myindex' in df.index:
 x = df.loc[myindex, mycol]
else:
 x = mydefault

Is there any way to do this more nicely? For example on an arbitrary object I can do x = getattr(anobject, 'id', default) - is there anything similar to this in pandas? Really any way to achieve what I'm doing more gracefully?

wjandrea
33.6k10 gold badges69 silver badges104 bronze badges
asked May 1, 2014 at 6:50

5 Answers 5

62

There is a method for Series:

So you could do:

df.mycol.get(myIndex, NaN)

Example:

In [117]:
df = pd.DataFrame({'mycol':arange(5), 'dummy':arange(5)})
df
Out[117]:
 dummy mycol
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4
[5 rows x 2 columns]
In [118]:
print(df.mycol.get(2, NaN))
print(df.mycol.get(5, NaN))
2
nan
wjandrea
33.6k10 gold badges69 silver badges104 bronze badges
answered May 1, 2014 at 8:49

1 Comment

I was also able to get it to work when the index is known to exist: df.loc['myindex'].get('mycol', NaN) A shame that you still need to be sure that one of the index or column exists, but nonetheless this will be useful in a lot of scenarios. Thank you!
27

Python has this mentality to ask for forgiveness instead of permission. You'll find a lot of posts on this matter, such as this one.

In Python catching exceptions is relatively inexpensive, so you're encouraged to use it. This is called the EAFP approach.

For example:

try:
 x = df.loc['myindex', 'mycol']
except KeyError:
 x = mydefault
fantabolous
22.8k8 gold badges58 silver badges52 bronze badges
answered May 1, 2014 at 8:10

2 Comments

Perhaps I should use more EAFP, but my personal preference is to save try/excepts for when there's no other easy choice. Thanks though.
@Foobar: according to this link it is only the try: that is inexpensive. except: seems to be expensive. The moral of the story seems to be that the caller is left to decide between testing for existence or try: except:ing. The performance trade off depending on your use case. i.e. how long it takes to test existence vs how many times not testing will raise. Nevertheless, it would be nice if pandas offered syntactic sugar by permitting that choice to be argument driven. As far as I can tell, it does not.
1

There is the get method for DataFrame to get a column and another get for Series to get an item. So you can chain them together to get a single value:

 A B
0 0 2
1 1 3
df.get('B', default=pd.Series()).get(1, default='[unknown]')

Output:

3

If the index or column is missing:

df.get('B', default=pd.Series()).get(2, default='[unknown]')
# or
df.get('C', default=pd.Series()).get(1, default='[unknown]')

Output:

'[unknown]'
answered Sep 14, 2023 at 20:09

Comments

0

Use reindex:

df.reindex(index=['row1', 'row2'], columns=['col1', 'col2'], fill_value=mydefault)

What's great here is using lists for the rows and columns, where some of them exist and some of them don't, and you get the fallback value whenever either the row or column is missing.

Example:

In[1]:
df = pd.DataFrame({ 
 'A':[1, 2, 3],
 'B':[5, 3, 7],
})
df
Out[1]:
 A B
0 1 5
1 2 3
2 3 7
In[2]:
df.reindex(index=[0, 1, 100], columns=['A', 'C'], fill_value='FV')
Out[2]:
 A C
0 1 FV
1 2 FV
100 FV FV
answered Mar 9, 2023 at 18:44

2 Comments

Always good to have alternatives, but this would be very slow on a df of any significant size. It creates an entirely new df just to get one value.
@fantabolous well the point of this is that you can get more than one value, in which case you are creating a new df anyway
0

Define Function

 # Define Function:
 def getvalue(df,index,column_key,default_value):
 try:
 return df.loc[index,column_key]
 except KeyError:
 return default_value

Example:

# define dictionary
thisdict = {
 "brand": ["Ford",'Honda','Toyta'],
 "model": ["Mustang",'CRV','Camry'],
 "year": [1964,2004,1892 ]
}
# create dataframe 
df = pd.DataFrame(thisdict)
# print dataframe
print(df )
print()
# Test all 4 scenarios
colNotFound = getvalue(df,1,'name',"ColNotFound")
print(colNotFound + '\n')
indexNotFound = getvalue(df, 4,'model',"indexNotFound")
print(indexNotFound + '\n')
colandindexNotFound = getvalue(df, 4,'name',"colandindexNotFound")
print(colandindexNotFound + '\n')
keyandcolindf = getvalue(df, 1,'model',"Nothing")
print(keyandcolindf + '\n')

output:

 brand model year
0 Ford Mustang 1964
1 Honda CRV 2004
2 Toyta Camry 1892
ColNotFound
indexNotFound
colandindexNotFound
CRV
answered May 29, 2024 at 21:00

2 Comments

I like the idea, but I tried and neither method actually works. Did you test these? I may be wrong but I don't think KeyError can be used as a boolean (first method) or index key (second method). Also, in the first method it evaluates the conditional BEFORE it evaluates df.loc[] so the KeyError wouldn't have raised yet.
i have revised my answer.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.