count number of rows given column names and values - pandas

Question 1

I am currently switching for R to Python so please be patient with me. Is the following a good way to count the number of rows given column names and values?

import pandas as pd
df = pd.DataFrame([["1", "2"], ["2", "4"], ["1", "4"]], columns=['A', 'B'])
cn1 = "A"
cn2 = "B"
cv1 = "1"
cv2 = "2"
no_rows = len(df[(df[cn1]==cv1) & (df[cn2]==cv2)].index)
print(no_rows)

Question 2

First, it's a bad idea to input your numerics as strings in your dataframe. Use plain ints instead.

Your code currently forms a predicate, performs a slice on the frame and then finds the size of the frame. This is more work than necessary - the predicate itself is a series of booleans, and running a .sum() on it produces the number of matching values.

That, plus your current code is not general-purpose. A general-purpose implementation could look like

from typing import Dict, Any
import pandas as pd
def match_count(df: pd.DataFrame, **criteria: Any) -> int:
 pairs = iter(criteria.items())
 column, value = next(pairs)
 predicate = df[column] == value
 for column, value in pairs:
 predicate &= df[column] == value
 return predicate.sum()
def test() -> None:
 df = pd.DataFrame(
 [[1, 2],
 [2, 4],
 [1, 4]],
 columns=['A', 'B'],
 )
 print(match_count(df, A=1, B=2))
if __name__ == '__main__':
 test()

Question 3

I usually use shape[0] because it's more readable, so in your case it would be:

no_rows = df[(df[cn1]==cv1) & (df[cn2]==cv2)].shape[0]

Question 4

While this specific example can be completely refactored into Reinderien's top-notch functions, we don't always need something so elaborate (e.g., quick exploratory analysis).

Masking and counting come up very often in one form or another, so I think it's still worth reviewing how to do them idiomatically in pandas.

Revised code

Maintaining the spirit of the original code, I would use something like:

matches = df[cn1].eq(cv1) & df[cn2].eq(cv2)
len(df[matches]) # but remember that matches.sum() is faster

Comments on the original code

len(df[(df[cn1] == cv1) & (df[cn2] == cv2)].index)
^ ^ ^ ^
3 2 4 1

No need to use .index explicitly since DataFrame.__len__ does it automatically:

class DataFrame(NDFrame, OpsMixin):
 ...
 def __len__(self) -> int:
 return len(self.index)

DataFrame.eq can sometimes be useful over ==:
- supports axis / level broadcasting
- arguably more readable when joining multiple tests
```
df[cn1].eq(cv1) & df[cn2].eq(cv2) # (df[cn1] == cv1) & (df[cn2] == cv2)
```
- arguably more readable when chaining methods (e.g., when comparing shifted columns)
```
df[cn1].shift().eq(cv1).cumsum() # (df[cn1].shift() == cv1).cumsum()
```
If speed is important, len(df) is faster than df.shape[0] (h/t @root):
If you have a lot of conditions to join (e.g., generated via comprehension), consider np.logical_and.reduce:
```
df[np.logical_and.reduce([
 df[cn1] == cv1,
 # ...
 df[cn2] == cv2,
])]
```

Reinderien Reinderien 70.9k5 gold badges76 silver badges256 bronze badges · Accepted Answer · 2021-12-06 17:34:04Z

First, it's a bad idea to input your numerics as strings in your dataframe. Use plain ints instead.

Your code currently forms a predicate, performs a slice on the frame and then finds the size of the frame. This is more work than necessary - the predicate itself is a series of booleans, and running a .sum() on it produces the number of matching values.

That, plus your current code is not general-purpose. A general-purpose implementation could look like

from typing import Dict, Any
import pandas as pd
def match_count(df: pd.DataFrame, **criteria: Any) -> int:
 pairs = iter(criteria.items())
 column, value = next(pairs)
 predicate = df[column] == value
 for column, value in pairs:
 predicate &= df[column] == value
 return predicate.sum()
def test() -> None:
 df = pd.DataFrame(
 [[1, 2],
 [2, 4],
 [1, 4]],
 columns=['A', 'B'],
 )
 print(match_count(df, A=1, B=2))
if __name__ == '__main__':
 test()

Stack Exchange Network

count number of rows given column names and values - pandas

3 Answers 3

Revised code

Comments on the original code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

count number of rows given column names and values - pandas

3 Answers 3

Revised code

Comments on the original code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions