I'll try to look for help once more, so my base code is ready, in the very beginning, it converts all the negative values to 0, and after that, it does calculate the sum and cumulative values of the csv data:
import csv
from collections import defaultdict, OrderedDict
def convert(data):
try:
return int(data)
except ValueError:
return 0
with open('MonthData1.csv', 'r') as file1:
read_file = csv.reader(file1, delimiter=';')
delheader = next(read_file)
data = defaultdict(int)
for line in read_file:
valuedata = max(0, sum([convert(i) for i in line[1:5]]))
data[line[0].split()[0]] += valuedata
for key in OrderedDict(sorted(data.items())):
print('{};{}'.format(key, data[key]))
print("")
previous_values = []
for key, value in OrderedDict(sorted(data.items())).items():
print('{};{}'.format(key, value + sum(previous_values)))
previous_values.append(value)
This code prints:
1.5.2018 245
2.5.2018 105
4.5.2018 87
1.5.2018 245
2.5.2018 350
4.5.2018 437
That's how I want it to print the data. First the sum of each day, and then the cumulative value. My question is, how can I format this data so it can be written to a new csv file with the same format as it prints it? So the new csv file should look like this:
enter image description here
I have tried to do it myself (with dateime), and searched for answers but I just can't find a way. I hope to get a solution this time, I'd appreciate it massively.
The data file as csv: https://files.fm/u/2vjppmgv
Data file in pastebin https://pastebin.com/Tw4aYdPc
Hope this can be done with default libraries
3 Answers 3
Writing a CSV is simply a matter of writing values separated by commas (or semi-colons in this case. A CSV is a plain text file (a .txt if you will). You can read it and write using python's open() function if you'd like to.
You could actually get rid of the CSV module if you wish. I included an example of this in the end.
This version uses only the libraries that were available in your original code.
import csv
from collections import defaultdict, OrderedDict
def convert(data):
try:
return int(data)
except ValueError:
return 0
file1 = open('Monthdata1.csv', 'r')
file2 = open('result.csv', 'w')
read_file = csv.reader(file1, delimiter=';')
delheader = next(read_file)
data = defaultdict(int)
for line in read_file:
valuedata = max(0, sum([convert(i) for i in line[1:5]]))
data[line[0].split()[0]] += valuedata
for key in OrderedDict(sorted(data.items())):
file2.write('{};{}\n'.format(key, data[key]))
file2.write('\n')
previous_values = []
for key, value in OrderedDict(sorted(data.items())).items():
file2.write('{};{}\n'.format(key, value + sum(previous_values)))
previous_values.append(value)
file1.close()
file2.close()
There is a gotcha here, though. As I didn't import the os module (that is a default library) I used the characters \n to end the line. This will work fine under Linux and Mac, but under windows you should use \r\n. To avoid this issue you should import the os module and use os.linesep instead of \n.
import os
(...)
file2.write('{};{}{}'.format(key, data[key], os.linesep))
(...)
file2.write('{};{}{}'.format(key, value + sum(previous_values), os.linesep))
As a sidenote this is an example of how you could read your CSV without the need for the CSV module:
data = [i.split(";") for i in open('MonthData1.csv').read().split('\n')]
If you had a more complex CSV file, especially if it had strings that could have semi-colons within, you'd better go for the CSV module.
The pandas library, mentioned in other answers is a great tool. It will most certainly be able to handle any need you might have to deal with CSV data.
5 Comments
file2.write('\r\n')This code creates a new csv file with the same format as what's printed.
import pandas as pd #added
import csv
from collections import defaultdict, OrderedDict
def convert(data):
try:
return int(data)
except ValueError:
return 0
keys = [] #added
data_keys = [] #added
with open('MonthData1.csv', 'r') as file1:
read_file = csv.reader(file1, delimiter=';')
delheader = next(read_file)
data = defaultdict(int)
for line in read_file:
valuedata = max(0, sum([convert(i) for i in line[1:5]]))
data[line[0].split()[0]] += valuedata
for key in OrderedDict(sorted(data.items())):
print('{} {}'.format(key, data[key]))
keys.append(key) #added
data_keys.append(data[key]) #added
print("")
keys.append("") #added
data_keys.append("") #added
previous_values = []
for key, value in OrderedDict(sorted(data.items())).items():
print('{} {}'.format(key, value + sum(previous_values)))
keys.append(key) #added
data_keys.append(value + sum(previous_values)) #added
previous_values.append(value)
df = pd.DataFrame(data_keys,keys) #added
df.to_csv('new_csv_file.csv', header=False) #added
5 Comments
This is the version that does not use any imports at all
def convert(data):
try:
out = int(data)
except ValueError:
out = 0
return out ### try to avoid multiple return statements
with open('Monthdata1.csv', 'rb') as file1:
lines = file1.readlines()
data = [ [ d.strip() for d in l.split(';')] for l in lines[ 1 : : ] ]
myDict = dict()
for d in data:
key = d[0].split()[0]
value = max(0, sum([convert(i) for i in d[1:5]]))
try:
myDict[key] += value
except KeyError:
myDict[key] = value
s1=""
s2=""
accu = 0
for key in sorted( myDict.keys() ):
accu += myDict[key]
s1 += '{} {}\n'.format( key, myDict[key] )
s2 += '{} {}\n'.format( key, accu )
with open( 'out.txt', 'wb') as fPntr:
fPntr.write( s1 + "\n" + s2 )
This uses non-ordered dictionaries, though, such that sorted() may result in problems. So you actually might want to use datetime giving, e.g.:
import datetime
with open('Monthdata1.csv', 'rb') as file1:
lines = file1.readlines()
data = [ [ d.strip() for d in l.split(';')] for l in lines[ 1 : : ] ]
myDict = dict()
for d in data:
key = datetime.datetime.strptime( d[0].split()[0], '%d.%m.%Y' )
value = max(0, sum([convert(i) for i in d[1:5]]))
try:
myDict[key] += value
except KeyError:
myDict[key] = value
s1=""
s2=""
accu = 0
for key in sorted( myDict.keys() ):
accu += myDict[key]
s1 += '{} {}\n'.format( key.strftime('%d.%m.%y'), myDict[key] )
s2 += '{} {}\n'.format( key.strftime('%d.%m.%y'), accu )
with open( 'out.txt', 'wb') as fPntr:
fPntr.write( s1 + "\n" + s2 )
Note that I changed to the 2 digit year by using %y instead of %Y in the output. This formatting also adds a 0 to day and month.
1 Comment
os.linesep in ndvo's answer.
import pandas as pd df.to_csv("\\path\\output.csv")