Manipulating data frames

Question 1

I have a data frame df with one of the columns called Rule_ID. It has data like -

Rule_ID
[u'2c78g',u'df567',u'5ty78']
[u'2c78g',u'd67gh',u'df890o']
[u'd67gh',u'df890o',u'5ty78']
[u'2c78g',u'5ty78',u'df890o']

I want to count all unique rule ID's within the column and create a new data frame df1 with two columns, first one containing the unique rule ID and the second one containing the count of that ID. For example in above case df1 would contain -

Rule_ID Count
u'2c78g' 3
u'df567' 1
u'5ty78' 3
u'd67gh' 2
u'df890o' 3

Question 2

Attempt at a solution?

Question 3

Option 1

df.Rule_ID.apply(pd.Series).stack().value_counts()
df890o 3
5ty78 3
2c78g 3
d67gh 2
df567 1
dtype: int64

Option 2

pd.value_counts(pd.np.concatenate(df.Rule_ID.values))
df890o 3
5ty78 3
2c78g 3
d67gh 2
df567 1
dtype: int64

If those are strings, do this:

from ast import literal_eval
pd.value_counts(pd.np.concatenate([literal_eval(x) for x in df.Rule_ID.values]))
# or
# df.Rule_ID.apply(literal_eval).apply(pd.Series).stack().value_counts()
df890o 3
5ty78 3
2c78g 3
d67gh 2
df567 1
dtype: int64

Question 4

TypeError: unbound method nunique() must be called with Series instance as first argument (got str instance instead)

Question 5

@Dreamer it should be fixed now.

Question 6

It is giving me results like - ['df890o','5ty78','2c78g'] 3, because all these three IDs have the same count. Can you break it ? As in only one Rule ID in each cell with its corresponding count next to it. Thank you so much! You have always been so helpful

Question 7

That's unfortunate... its because the values are strings that look like lists.

Question 8

:( What is the solution?

piRSquared 296k68 gold badges509 silver badges654 bronze badges · Accepted Answer · 2017-06-27 16:21:59Z

1

Option 1

df.Rule_ID.apply(pd.Series).stack().value_counts()
df890o 3
5ty78 3
2c78g 3
d67gh 2
df567 1
dtype: int64

Option 2

pd.value_counts(pd.np.concatenate(df.Rule_ID.values))
df890o 3
5ty78 3
2c78g 3
d67gh 2
df567 1
dtype: int64

If those are strings, do this:

from ast import literal_eval
pd.value_counts(pd.np.concatenate([literal_eval(x) for x in df.Rule_ID.values]))
# or
# df.Rule_ID.apply(literal_eval).apply(pd.Series).stack().value_counts()
df890o 3
5ty78 3
2c78g 3
d67gh 2
df567 1
dtype: int64

Share

Improve this answer

edited Jun 27, 2017 at 17:02

answered Jun 27, 2017 at 16:21

piRSquared's user avatar

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

ComplexData

ComplexData Over a year ago

TypeError: unbound method nunique() must be called with Series instance as first argument (got str instance instead)

2017年06月27日T16:33:10.57Z+00:00

piRSquared

piRSquared Over a year ago

@Dreamer it should be fixed now.

2017年06月27日T16:39:52.643Z+00:00

ComplexData

ComplexData Over a year ago

It is giving me results like - ['df890o','5ty78','2c78g'] 3, because all these three IDs have the same count. Can you break it ? As in only one Rule ID in each cell with its corresponding count next to it. Thank you so much! You have always been so helpful

2017年06月27日T16:52:26.823Z+00:00

piRSquared

piRSquared Over a year ago

That's unfortunate... its because the values are strings that look like lists.

2017年06月27日T16:59:02.493Z+00:00

ComplexData

ComplexData Over a year ago

:( What is the solution?

2017年06月27日T16:59:34.72Z+00:00

|

CollectivesTM on Stack Overflow

Manipulating data frames

1 Answer 1

10 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

10 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related