I'm trying to parse a csv file in python and print the sum of order_total for each day. Below is the sample csv file
order_total created_datetime
24.99 2015年06月01日 00:00:12
0 2015年06月01日 00:03:15
164.45 2015年06月01日 00:04:05
24.99 2015年06月01日 00:08:01
0 2015年06月01日 00:08:23
46.73 2015年06月01日 00:08:51
0 2015年06月01日 00:08:58
47.73 2015年06月02日 00:00:25
101.74 2015年06月02日 00:04:11
119.99 2015年06月02日 00:04:35
38.59 2015年06月02日 00:05:26
73.47 2015年06月02日 00:06:50
34.24 2015年06月02日 00:07:36
27.24 2015年06月03日 00:01:40
82.2 2015年06月03日 00:12:21
23.48 2015年06月03日 00:12:35
My objective here is to print the sum(order_total) for each day. For example the result should be
2015年06月01日 -> 261.16
2015年06月02日 -> 415.75
2015年06月03日 -> 132.92
I have written the below code - its does not perform the logic yet, but I'm trying to see if its able to parse and loop as required by printing some sample statements.
def sum_orders_test(self,start_date,end_date):
initial_date = datetime.date(int(start_date.split('-')[0]),int(start_date.split('-')[1]),int(start_date.split('-')[2]))
final_date = datetime.date(int(end_date.split('-')[0]),int(end_date.split('-')[1]),int(end_date.split('-')[2]))
day = datetime.timedelta(days=1)
with open("file1.csv", 'r') as data_file:
next(data_file)
reader = csv.reader(data_file, delimiter=',')
if initial_date <= final_date:
for row in reader:
if str(initial_date) in row[1]:
print 'initial_date : ' + str(initial_date)
print 'Date : ' + row[1]
else:
print 'Else'
initial_date = initial_date + day
based on my current logic I'm running into this issue -
- As you can see in the sample csv there are 7 rows for
2015年06月01日, 6 rows for2015年06月02日and 3 rows for2015年06月03日. - My output of above code is printing 7 values for
2015年06月01日, 5 for2015年06月02日and 2 for2015年06月03日
Calling the function using sum_orders_test('2015-06-01','2015-06-03');
I know there is some silly logical issue, but being new to programming and python I'm unable to figure it out.
3 Answers 3
I've re-read the question, and if your data is really tab-separated, here's the following source to do the job (using pandas):
import pandas as pd
df = pd.DataFrame(pd.read_csv('file.csv', names=['order_total', 'created_datetime'], sep='\t'))
df['created_datetime'] = pd.to_datetime(df.created_datetime).dt.date
df = df.groupby(['created_datetime']).sum()
print(df)
Gives the following result:
order_total
created_datetime
2015年06月01日 261.16
2015年06月02日 415.76
2015年06月03日 132.92
Less codes, and probably lower algorithm complexity.
3 Comments
This one should do the job.
csv module has DictReader, in which you can include fieldnames so instead of accessing columns by index (row[0]), you can predefine columns names(row['date']).
from datetime import datetime, timedelta
from collections import defaultdict
def sum_orders_test(self, start_date, end_date):
FIELDNAMES = ['orders', 'date']
sum_of_orders = defaultdict(int)
initial_date = datetime.strptime(start_date, '%Y-%m-%d').date()
final_date = datetime.strptime(end_date, '%Y-%m-%d').date()
day = timedelta(days=1)
with open("file1.csv", 'r') as data_file:
next(data_file) # Skip the headers
reader = csv.DictReader(data_file, fieldnames=FIELDNAMES)
if initial_date <= final_date:
for row in reader:
if str(initial_date) in row['date']:
sum_of_orders[str(initial_date)] += int(row['orders'])
else:
initial_date += day
return sum_of_orders
2 Comments
You might have a .csv file extension, but your file seems to be a tab separated file actually.
You can load it as pandas dataframe but specifying the separator.
import pandas as pd
data = pd.read_csv('file.csv', sep='\t')
Then split the datetime column into date and time
data = pd.DataFrame(data.created_datetime.str.split(' ',1).tolist(),
columns = ['date','time'])
Then for each unique date, compute it's order_total sum
for i in data.date.unique():
print i, data[data['date'] == i].order_total.sum()
delimiter=',')... Please tell me where the commas in the file arepandas?