I have a data frame df with one of the columns called Rule_ID. It has data like -
Rule_ID
[u'2c78g',u'df567',u'5ty78']
[u'2c78g',u'd67gh',u'df890o']
[u'd67gh',u'df890o',u'5ty78']
[u'2c78g',u'5ty78',u'df890o']
I want to count all unique rule ID's within the column and create a new data frame df1 with two columns, first one containing the unique rule ID and the second one containing the count of that ID. For example in above case df1 would contain -
Rule_ID Count
u'2c78g' 3
u'df567' 1
u'5ty78' 3
u'd67gh' 2
u'df890o' 3
asked Jun 27, 2017 at 15:34
ComplexData
1,1134 gold badges20 silver badges38 bronze badges
-
1Attempt at a solution?EFT– EFT2017年06月27日 16:18:22 +00:00Commented Jun 27, 2017 at 16:18
1 Answer 1
Option 1
df.Rule_ID.apply(pd.Series).stack().value_counts()
df890o 3
5ty78 3
2c78g 3
d67gh 2
df567 1
dtype: int64
Option 2
pd.value_counts(pd.np.concatenate(df.Rule_ID.values))
df890o 3
5ty78 3
2c78g 3
d67gh 2
df567 1
dtype: int64
If those are strings, do this:
from ast import literal_eval
pd.value_counts(pd.np.concatenate([literal_eval(x) for x in df.Rule_ID.values]))
# or
# df.Rule_ID.apply(literal_eval).apply(pd.Series).stack().value_counts()
df890o 3
5ty78 3
2c78g 3
d67gh 2
df567 1
dtype: int64
answered Jun 27, 2017 at 16:21
piRSquared
296k68 gold badges509 silver badges654 bronze badges
Sign up to request clarification or add additional context in comments.
10 Comments
ComplexData
TypeError: unbound method nunique() must be called with Series instance as first argument (got str instance instead)
piRSquared
@Dreamer it should be fixed now.
ComplexData
It is giving me results like - ['df890o','5ty78','2c78g'] 3, because all these three IDs have the same count. Can you break it ? As in only one Rule ID in each cell with its corresponding count next to it. Thank you so much! You have always been so helpful
piRSquared
That's unfortunate... its because the values are strings that look like lists.
ComplexData
:( What is the solution?
|
lang-py