2
\$\begingroup\$

In my real case I have a set of time series related to different IDs stored in a single DataFrame Some are composed by 400 samples, some by 1000 samples, some by 2000. They are stored in the same df and:

I would like to drop all the IDs made up of time series shorter than a custom length.

I wrote the following code, but I think is very ugly and inefficient.

import pandas as pd
import numpy as np
dict={"samples":[1,2,3,4,5,6,7,8,9],"id":["a","b","c","b","b","b","c","c","c"]}
df=pd.DataFrame(dict)
df_id=pd.DataFrame()
for i in set(df.id):
 df_filtered=df[df.id==i]
 len_id=len(df_filtered.samples)
 if len_id>3: #3 is just a random choice for this example
 df_id=df_id.append(df_filtered)
print(df_id)

Output:

 samples id
2 3 c
6 7 c
7 8 c
8 9 c
1 2 b
3 4 b
4 5 b
5 6 b

How to improve it in a more Pythonic way? Thanks

asked Apr 1, 2021 at 16:25
\$\endgroup\$

2 Answers 2

3
\$\begingroup\$

Good answer by Juho. Another option is a groupby-filter:

df.groupby('id').filter(lambda group: len(group) > 3)
# samples id
# 1 2 b
# 2 3 c
# 3 4 b
# 4 5 b
# 5 6 b
# 6 7 c
# 7 8 c
# 8 9 c

To match your output order exactly, add a descending id sort: .sort_values('id', ascending=False)

answered Apr 2, 2021 at 2:52
\$\endgroup\$
1
  • 1
    \$\begingroup\$ This is neat as well. \$\endgroup\$ Commented Apr 2, 2021 at 6:19
2
\$\begingroup\$

There are many solutions. For example, you can use a groupby-transform and drop the "small" samples. An appropriate solution will depend on what your requirements are more closely, i.e., will you do the preprocessing once, and then drop different samples?

Anyway, consider:

import pandas as pd
df = pd.DataFrame({"samples": [1,2,3,4,5,6,7,8,9], "id": ["a","b","c","b","b","b","c","c","c"]})
df["counts"] = df.groupby("id")["samples"].transform("count")
df[df["counts"] > 3]
# Or if you want:
df[df["counts"] > 3].drop(columns="counts")

By the way, avoid using dict as a variable name.

answered Apr 1, 2021 at 19:14
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.