4
\$\begingroup\$

I have a .csv file of 8k+ rows which looks like this:

 state assembly candidate \
0 Andaman & Nicobar Islands Andaman & Nicobar Islands BISHNU PADA RAY 
1 Andaman & Nicobar Islands Andaman & Nicobar Islands KULDEEP RAI SHARMA 
2 Andaman & Nicobar Islands Andaman & Nicobar Islands SANJAY MESHACK 
3 Andaman & Nicobar Islands Andaman & Nicobar Islands ANITA MONDAL 
4 Andaman & Nicobar Islands Andaman & Nicobar Islands K.G.DAS 
 party votes 
0 Bharatiya Janata Party 90969 
1 Indian National Congress 83157 
2 Aam Aadmi Party 3737 
3 All India Trinamool Congress 2283 
4 Communist Party of India (Marxist) 1777 

The end dataframe I wanted to get was one which contains all the states as rows and two columns - one which has votes received by a particular party ("Bhartiya Janata Party", in this case) in that row's state and another which has the total votes from the state. Like this:

 State Total Votes BJP Votes
Andaman & Nicobar Islands 190328 90969.0
Andhra Pradesh 48358545 4091876.0
Arunachal Pradesh 596956 275344.0
Assam 15085883 5507152.0
Bihar 35885366 10543023.0

My code works but I'm pretty sure there's a much better way to get this done using fewer lines of code and without creating too many dataframes. Here's my code:

dff = df.groupby(['party'])[['votes']].agg('sum')
dff = dff.sort_values('votes')
BJP_df = df[df["party"]=="Bharatiya Janata Party"]
#print(BJP_df.head())
group = BJP_df.groupby(['state'])[['votes']].agg('sum')
state = df.groupby(['state'])[['votes']].agg('sum')
result = pd.concat([state, group], axis = 1, sort=False)
result.columns = ["Total Votes","BJP Votes"]

Any tips, suggestions, pointers would be very much appreciated.

Peilonrayz
44.4k7 gold badges80 silver badges157 bronze badges
asked May 20, 2019 at 17:01
\$\endgroup\$

2 Answers 2

1
\$\begingroup\$

Here is one way using df.pivot_table() :

Replace any other party except Bharatiya Janata Party as Others using np.where() and then use pivot_table, finally get sum() across axis=1 for sum of votes.

df1=(df.assign(party=np.where(df.party.ne('Bharatiya Janata Party'),'Others',df.party)).
pivot_table(index='state',columns='party',values='votes',aggfunc='sum'))

Another method with crosstab() similar to pivot_table:

df1=pd.crosstab(df.state,np.where(df.party.ne('Bharatiya Janata Party'),'Others',df.party)
,df.votes,aggfunc='sum')

Finally, getting the Total and reset_index():

df1=df1.assign(Total=df1.sum(axis=1)).reset_index().rename_axis(None,axis=1)

Output: (Note: I had added dummy Andhra Pradesh rows for testing)

 state Bharatiya Janata Party Others Total
0 Andaman & Nicobar Islands 90969 90954 181923
1 Andhra Pradesh 100 85 185

You can opt to delete the Others column later : df1=df1.drop('Others',1)

answered Jun 24, 2019 at 9:40
\$\endgroup\$
2
  • 1
    \$\begingroup\$ Almost thought this question was lost in the depths of Code Review. Thanks for the answer! \$\endgroup\$ Commented Jun 24, 2019 at 11:55
  • \$\begingroup\$ @Abhishek My pleasure. :) i started contributing to this community starting today. :) \$\endgroup\$ Commented Jun 24, 2019 at 12:08
2
\$\begingroup\$

In all your code was not too bad. You can groupby on 2 items:

votes_per_state = df.groupby(["state", "party"])["votes"].sum().unstack(fill_value=0)
state Aam Aadmi Party All India Trinamool Congress Bharatiya Janata Party Communist Party of India (Marxist) Indian National Congress other
Andaman & Nicobar Islands 3737 2283 90969 1777 83157 0
Andhra Pradesh 0 0 85 0 0 100

Then you can define which party you're interested in, and manually assemble a DataFrame

party_of_interest = "Bharatiya Janata Party"
result = pd.DataFrame(
 {
 party_of_interest: votes_per_state[party_of_interest],
 "total": votes_per_state.sum(axis=1),
 }
)
state Bharatiya Janata Party total
Andaman & Nicobar Islands 90969 181923
Andhra Pradesh 85 185

If you want you can even add a percentage:

result = pd.DataFrame(
 {
 party_of_interest: votes_per_state[party_of_interest],
 "total": votes_per_state.sum(axis=1),
 "pct": (
 votes_per_state[party_of_interest]
 / votes_per_state.sum(axis=1)
 * 100
 ).round(1),
 }
)
state Bharatiya Janata Party total pct
Andaman & Nicobar Islands 90969 181923 50.0
Andhra Pradesh 85 185 45.9
answered Jun 24, 2019 at 12:21
\$\endgroup\$
1
  • \$\begingroup\$ I know that my code worked. I was just looking for something to improve efficiency as well as be more Pythonic. Seems like every project I work on ends up with me creating over 10-12 different dataframes. Don't know if that's just me. Thank you for your answer. \$\endgroup\$ Commented Jun 24, 2019 at 13:04

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.