5
\$\begingroup\$

I'm a Data Science newbie and currently looking to improve my code. I was trying to calculate the total number of the following:

  • The total number of births on each month
  • The total number of births on each day of the week

Sample dataset from CSV:

year, month, date_of_month, day_of_week, births
1994, 1, 1, 6, 8096
1994, 1, 2, 7, 7772
1994, 1, 3, 1, 10142
1994, 1, 4, 2, 11248
1994, 1, 5, 3, 11053
...

It led me into this implementation:

def weekly_births(lst):
 mon = birth_counter(lst, 3, 1, 4)
 tue = birth_counter(lst, 3, 2, 4)
 wed = birth_counter(lst, 3, 3, 4)
 thu = birth_counter(lst, 3, 4, 4)
 fri = birth_counter(lst, 3, 5, 4)
 sat = birth_counter(lst, 3, 6, 4)
 sun = birth_counter(lst, 3, 7, 4)
 births_per_week = {
 1: mon,
 2: tue,
 3: wed,
 4: thu,
 5: fri,
 6: sat,
 7: sun
 }
 return births_per_week
def monthly_births(lst):
 jan_births = birth_counter(lst, 1, 1, 4)
 feb_births = birth_counter(lst, 1, 2, 4)
 mar_births = birth_counter(lst, 1, 3, 4)
 apr_births = birth_counter(lst, 1, 4, 4)
 may_births = birth_counter(lst, 1, 5, 4)
 jun_births = birth_counter(lst, 1, 6, 4)
 jul_births = birth_counter(lst, 1, 7, 4)
 aug_births = birth_counter(lst, 1, 8, 4)
 sep_births = birth_counter(lst, 1, 9, 4)
 oct_births = birth_counter(lst, 1, 10, 4)
 nov_births = birth_counter(lst, 1, 11, 4)
 dec_births = birth_counter(lst, 1, 12, 4)
 births_per_month = {
 1: jan_births,
 2: feb_births,
 3: mar_births,
 4: apr_births,
 5: may_births,
 6: jun_births,
 7: jul_births,
 8: aug_births,
 9: sep_births,
 10: oct_births,
 11: nov_births,
 12: dec_births
 }
 return births_per_month

The birth_counter function:

def birth_counter(lst, index, head, tail):
 sum = 0
 for each in lst:
 if each[index] == head:
 sum = sum + each[tail]
 return sum

The parameters:

  • lst - The list of dataset
  • index - The lst's index
  • head - Will be compared from lst's index
  • tail - The target data that needs to be computed

Example usage:

[lst] [0] [1] [2] [3] [4]
lst = [1994, 1, 1, 6, 8096]...
sample_births = birth_counter(lst, 1, 1, 4)
 if sample_births[1] == 1 then
 extract index[4] #8096

Questions regarding weekly_births and monthly_births:

  1. If you notice, I manually entered the number of weeks and months then calculated the total births on each. Is there a way to iterate over weeks and months to avoid a lengthy piece of code?
200_success
146k22 gold badges190 silver badges478 bronze badges
asked Jan 27, 2018 at 6:00
\$\endgroup\$

1 Answer 1

5
\$\begingroup\$

If you want to do data-analysis in Python, you should learn about numpy and pandas. The former implements efficient numeric calculations (on whole arrays). The latter uses numpy and introduces a DataFrame, which is a bit like a table that can be manipulated in many ways. You can sort it by some column(s), you can transform them and you can even group them by some column(s) and perform operations on the groups (which is what you want to do here).

Your current code boils down to very few lines with pandas:

import pandas as pd
df = pd.read_csv("us_birth_statistics.csv", skipinitialspace=True)
birth_per_month = df.groupby("month").births.sum()
birth_per_weekday = df.groupby("day_of_week").births.sum()
print(birth_per_month)
print()
print(birth_per_weekday)
#month
#1 48311
#Name: births, dtype: int64
#day_of_week
#1 10142
#2 11248
#3 11053
#6 8096
#7 7772
#Name: births, dtype: int64
answered Jan 27, 2018 at 14:13
\$\endgroup\$
1
  • \$\begingroup\$ Wow! I didn't know about that. What a good time to be alive. Thanks @Graipher! \$\endgroup\$ Commented Jan 28, 2018 at 1:11

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.