0

I'm trying to parse a csv file in python and print the sum of order_total for each day. Below is the sample csv file

 order_total created_datetime 
24.99 2015年06月01日 00:00:12 
0 2015年06月01日 00:03:15 
164.45 2015年06月01日 00:04:05 
24.99 2015年06月01日 00:08:01 
0 2015年06月01日 00:08:23 
46.73 2015年06月01日 00:08:51 
0 2015年06月01日 00:08:58 
47.73 2015年06月02日 00:00:25 
101.74 2015年06月02日 00:04:11 
119.99 2015年06月02日 00:04:35 
38.59 2015年06月02日 00:05:26 
73.47 2015年06月02日 00:06:50 
34.24 2015年06月02日 00:07:36 
27.24 2015年06月03日 00:01:40 
82.2 2015年06月03日 00:12:21 
23.48 2015年06月03日 00:12:35 

My objective here is to print the sum(order_total) for each day. For example the result should be

2015年06月01日 -> 261.16
2015年06月02日 -> 415.75
2015年06月03日 -> 132.92

I have written the below code - its does not perform the logic yet, but I'm trying to see if its able to parse and loop as required by printing some sample statements.

def sum_orders_test(self,start_date,end_date):
 initial_date = datetime.date(int(start_date.split('-')[0]),int(start_date.split('-')[1]),int(start_date.split('-')[2]))
 final_date = datetime.date(int(end_date.split('-')[0]),int(end_date.split('-')[1]),int(end_date.split('-')[2]))
 day = datetime.timedelta(days=1)
 with open("file1.csv", 'r') as data_file:
 next(data_file)
 reader = csv.reader(data_file, delimiter=',')
 if initial_date <= final_date:
 for row in reader:
 if str(initial_date) in row[1]:
 print 'initial_date : ' + str(initial_date)
 print 'Date : ' + row[1]
 else:
 print 'Else'
 initial_date = initial_date + day 

based on my current logic I'm running into this issue -

  1. As you can see in the sample csv there are 7 rows for 2015年06月01日, 6 rows for 2015年06月02日 and 3 rows for 2015年06月03日.
  2. My output of above code is printing 7 values for 2015年06月01日, 5 for 2015年06月02日 and 2 for 2015年06月03日

Calling the function using sum_orders_test('2015-06-01','2015-06-03');

I know there is some silly logical issue, but being new to programming and python I'm unable to figure it out.

asked Sep 3, 2017 at 8:18
4
  • 1
    delimiter=',')... Please tell me where the commas in the file are Commented Sep 3, 2017 at 8:22
  • its a csv file, and hence used ',', but its not there in file. Commented Sep 3, 2017 at 8:24
  • 1
    Have you tried using pandas? Commented Sep 3, 2017 at 8:24
  • That's exactly your problem... Python does not care about file extensions. Change the delimeter so you can actually read the data correctly Commented Sep 3, 2017 at 8:25

3 Answers 3

2

I've re-read the question, and if your data is really tab-separated, here's the following source to do the job (using pandas):

import pandas as pd
df = pd.DataFrame(pd.read_csv('file.csv', names=['order_total', 'created_datetime'], sep='\t'))
df['created_datetime'] = pd.to_datetime(df.created_datetime).dt.date
df = df.groupby(['created_datetime']).sum()
print(df)

Gives the following result:

 order_total
created_datetime 
2015年06月01日 261.16
2015年06月02日 415.76
2015年06月03日 132.92

Less codes, and probably lower algorithm complexity.

answered Sep 3, 2017 at 8:27
Sign up to request clarification or add additional context in comments.

3 Comments

It loks much easier, but my file is a csv file, although there isn't any tab or comma in the file. its a normal excel file saved as csv When I replace the '\t' with ',' and run I get below error df['created_datetime'] = pd.to_datetime(df.created_datetime).dt.date File "/Library/Python/2.7/site-packages/pandas/core/tools/datetimes.py", line 509, in to_datetime values = _convert_listlike(arg._values, False, format) File "/Library/Python/2.7/site-packages/pandas/core/tools/datetimes.py", line 447, in _convert_listlike raise e ValueError: Unknown string format @Abien
Will you please give a link to a sample of your data?
It certainly is :)
0

This one should do the job.

csv module has DictReader, in which you can include fieldnames so instead of accessing columns by index (row[0]), you can predefine columns names(row['date']).

from datetime import datetime, timedelta
from collections import defaultdict
def sum_orders_test(self, start_date, end_date):
 FIELDNAMES = ['orders', 'date']
 sum_of_orders = defaultdict(int)
 initial_date = datetime.strptime(start_date, '%Y-%m-%d').date()
 final_date = datetime.strptime(end_date, '%Y-%m-%d').date()
 day = timedelta(days=1)
 with open("file1.csv", 'r') as data_file:
 next(data_file) # Skip the headers
 reader = csv.DictReader(data_file, fieldnames=FIELDNAMES)
 if initial_date <= final_date:
 for row in reader:
 if str(initial_date) in row['date']:
 sum_of_orders[str(initial_date)] += int(row['orders'])
 else:
 initial_date += day
 return sum_of_orders
answered Sep 3, 2017 at 8:31

2 Comments

How does defaultdict work ? When I try to print sum_of_orders it shows defaultdict(<type 'int'>, {}) @Pythonist
Simply saying, it allows you to add new keys to a dictionary, of given type, without checking if they're in. Docs will say more than I can.
0

You might have a .csv file extension, but your file seems to be a tab separated file actually.

You can load it as pandas dataframe but specifying the separator.

import pandas as pd
data = pd.read_csv('file.csv', sep='\t')

Then split the datetime column into date and time

data = pd.DataFrame(data.created_datetime.str.split(' ',1).tolist(),
 columns = ['date','time'])

Then for each unique date, compute it's order_total sum

for i in data.date.unique():
 print i, data[data['date'] == i].order_total.sum()
answered Sep 3, 2017 at 8:43

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.