Python formatting data to csv file

Question 1

I'll try to look for help once more, so my base code is ready, in the very beginning, it converts all the negative values to 0, and after that, it does calculate the sum and cumulative values of the csv data:

import csv
from collections import defaultdict, OrderedDict
def convert(data):
 try:
 return int(data)
 except ValueError:
 return 0
with open('MonthData1.csv', 'r') as file1:
 read_file = csv.reader(file1, delimiter=';')
 delheader = next(read_file)
 data = defaultdict(int)
 for line in read_file:
 valuedata = max(0, sum([convert(i) for i in line[1:5]]))
 data[line[0].split()[0]] += valuedata
 for key in OrderedDict(sorted(data.items())):
 print('{};{}'.format(key, data[key]))
 print("")
 previous_values = []
 for key, value in OrderedDict(sorted(data.items())).items():
 print('{};{}'.format(key, value + sum(previous_values)))
 previous_values.append(value)

This code prints:

1.5.2018 245
2.5.2018 105
4.5.2018 87
1.5.2018 245
2.5.2018 350
4.5.2018 437

That's how I want it to print the data. First the sum of each day, and then the cumulative value. My question is, how can I format this data so it can be written to a new csv file with the same format as it prints it? So the new csv file should look like this:
enter image description here

I have tried to do it myself (with dateime), and searched for answers but I just can't find a way. I hope to get a solution this time, I'd appreciate it massively.
The data file as csv: https://files.fm/u/2vjppmgv
Data file in pastebin https://pastebin.com/Tw4aYdPc Hope this can be done with default libraries

Question 2

I may not have understood your question perfectly, but it seems that you simply need to change the two occurrences of '{} {}' for '{};{}'. In my test, the resulting CSV file looks exactly like the second image. If this was the issue, then it was not a matter of formatting the date, but of formatting the columns.

Question 3

Yeah, thanks. Do you know how should I write the data to a csv file? I have no idea on that part

Question 4

if you data is in a dataframe called df then simply import pandas as pd df.to_csv("\\path\\output.csv")

Question 5

I have the whole code with the default libraries, do you have ideas how this should be done without external libraries?

Question 6

Pandas is not external lib.

Question 7

Writing a CSV is simply a matter of writing values separated by commas (or semi-colons in this case. A CSV is a plain text file (a .txt if you will). You can read it and write using python's open() function if you'd like to.

You could actually get rid of the CSV module if you wish. I included an example of this in the end.

This version uses only the libraries that were available in your original code.

import csv
from collections import defaultdict, OrderedDict
def convert(data):
 try:
 return int(data)
 except ValueError:
 return 0 
file1 = open('Monthdata1.csv', 'r')
file2 = open('result.csv', 'w')
read_file = csv.reader(file1, delimiter=';')
delheader = next(read_file)
data = defaultdict(int)
for line in read_file:
 valuedata = max(0, sum([convert(i) for i in line[1:5]]))
 data[line[0].split()[0]] += valuedata
for key in OrderedDict(sorted(data.items())):
 file2.write('{};{}\n'.format(key, data[key]))
file2.write('\n')
previous_values = []
for key, value in OrderedDict(sorted(data.items())).items():
 file2.write('{};{}\n'.format(key, value + sum(previous_values)))
 previous_values.append(value)
file1.close()
file2.close()

There is a gotcha here, though. As I didn't import the os module (that is a default library) I used the characters \n to end the line. This will work fine under Linux and Mac, but under windows you should use \r\n. To avoid this issue you should import the os module and use os.linesep instead of \n.

import os
(...)
 file2.write('{};{}{}'.format(key, data[key], os.linesep))
(...)
 file2.write('{};{}{}'.format(key, value + sum(previous_values), os.linesep))

As a sidenote this is an example of how you could read your CSV without the need for the CSV module:

 data = [i.split(";") for i in open('MonthData1.csv').read().split('\n')]

If you had a more complex CSV file, especially if it had strings that could have semi-colons within, you'd better go for the CSV module.

The pandas library, mentioned in other answers is a great tool. It will most certainly be able to handle any need you might have to deal with CSV data.

Question 8

I can't open the csv file after running your code, it says that the file is being used. I used the windows code, as I'm a windows user

Question 9

I forgot to close the files. Perhaps that is the issue. You may need to close the software you are using to work with python and open it again to release the file. I updated the code to include the closing method.

Question 10

Yeah I figured it out too after commenting:D One edit if it's possible, could the first three be in a row after each other, and the cumulative too? Now there is a extra row after each date. If you don't understand me, I'd love to have it the same as the picture in the OP, if this is possible

Question 11

Sure. I edited the script in the answer for that. It is simply a matter of writing a blank line '\n' or '\r\n' in the file2. That is, file2.write('\r\n')

Question 12

Omg! I have tried to search for this solution for so long! I am so happy now, thank you, thank you a million times!

Question 13

This code creates a new csv file with the same format as what's printed.

import pandas as pd #added
import csv
from collections import defaultdict, OrderedDict
def convert(data):
 try:
 return int(data)
 except ValueError:
 return 0
keys = [] #added
data_keys = [] #added
with open('MonthData1.csv', 'r') as file1:
 read_file = csv.reader(file1, delimiter=';')
 delheader = next(read_file)
 data = defaultdict(int)
 for line in read_file:
 valuedata = max(0, sum([convert(i) for i in line[1:5]]))
 data[line[0].split()[0]] += valuedata
 for key in OrderedDict(sorted(data.items())):
 print('{} {}'.format(key, data[key]))
 keys.append(key) #added
 data_keys.append(data[key]) #added
 print("")
 keys.append("") #added
 data_keys.append("") #added
 previous_values = []
 for key, value in OrderedDict(sorted(data.items())).items():
 print('{} {}'.format(key, value + sum(previous_values)))
 keys.append(key) #added
 data_keys.append(value + sum(previous_values)) #added
 previous_values.append(value)
df = pd.DataFrame(data_keys,keys) #added
df.to_csv('new_csv_file.csv', header=False) #added

Question 14

Thanks for the reply, I have a version done with pandas, I'm looking for a solution done with default libraries. Do you think it's possible?

Question 15

I'm not familiar with what you mean by default libraries. Is numpy included?

Question 16

With default I mean something that I don't have to install separately. I don't think that numpy is included

Question 17

It would've been nice if you had specified this in the question. Unfortunately, I can't help.

Question 18

Sorry, my bad. I upvoted your post, and will mark is as solution of I don't get other comments

Question 19

This is the version that does not use any imports at all

def convert(data):
 try:
 out = int(data)
 except ValueError:
 out = 0
 return out ### try to avoid multiple return statements
with open('Monthdata1.csv', 'rb') as file1:
 lines = file1.readlines()
data = [ [ d.strip() for d in l.split(';')] for l in lines[ 1 : : ] ]
myDict = dict()
for d in data:
 key = d[0].split()[0]
 value = max(0, sum([convert(i) for i in d[1:5]]))
 try:
 myDict[key] += value
 except KeyError:
 myDict[key] = value
s1=""
s2=""
accu = 0
for key in sorted( myDict.keys() ):
 accu += myDict[key]
 s1 += '{} {}\n'.format( key, myDict[key] )
 s2 += '{} {}\n'.format( key, accu )
with open( 'out.txt', 'wb') as fPntr:
 fPntr.write( s1 + "\n" + s2 )

This uses non-ordered dictionaries, though, such that sorted() may result in problems. So you actually might want to use datetime giving, e.g.:

import datetime
with open('Monthdata1.csv', 'rb') as file1:
 lines = file1.readlines()
data = [ [ d.strip() for d in l.split(';')] for l in lines[ 1 : : ] ]
myDict = dict()
for d in data:
 key = datetime.datetime.strptime( d[0].split()[0], '%d.%m.%Y' )
 value = max(0, sum([convert(i) for i in d[1:5]]))
 try:
 myDict[key] += value
 except KeyError:
 myDict[key] = value
s1=""
s2=""
accu = 0
for key in sorted( myDict.keys() ):
 accu += myDict[key]
 s1 += '{} {}\n'.format( key.strftime('%d.%m.%y'), myDict[key] )
 s2 += '{} {}\n'.format( key.strftime('%d.%m.%y'), accu )
with open( 'out.txt', 'wb') as fPntr:
 fPntr.write( s1 + "\n" + s2 )

Note that I changed to the 2 digit year by using %y instead of %Y in the output. This formatting also adds a 0 to day and month.

Question 20

If on Windows, look at the os.linesep in ndvo's answer.

ndvo 98912 silver badges18 bronze badges · Accepted Answer · 2018-11-22 17:41:28Z

Writing a CSV is simply a matter of writing values separated by commas (or semi-colons in this case. A CSV is a plain text file (a .txt if you will). You can read it and write using python's open() function if you'd like to.

You could actually get rid of the CSV module if you wish. I included an example of this in the end.

This version uses only the libraries that were available in your original code.

import csv
from collections import defaultdict, OrderedDict
def convert(data):
 try:
 return int(data)
 except ValueError:
 return 0 
file1 = open('Monthdata1.csv', 'r')
file2 = open('result.csv', 'w')
read_file = csv.reader(file1, delimiter=';')
delheader = next(read_file)
data = defaultdict(int)
for line in read_file:
 valuedata = max(0, sum([convert(i) for i in line[1:5]]))
 data[line[0].split()[0]] += valuedata
for key in OrderedDict(sorted(data.items())):
 file2.write('{};{}\n'.format(key, data[key]))
file2.write('\n')
previous_values = []
for key, value in OrderedDict(sorted(data.items())).items():
 file2.write('{};{}\n'.format(key, value + sum(previous_values)))
 previous_values.append(value)
file1.close()
file2.close()

There is a gotcha here, though. As I didn't import the os module (that is a default library) I used the characters \n to end the line. This will work fine under Linux and Mac, but under windows you should use \r\n. To avoid this issue you should import the os module and use os.linesep instead of \n.

import os
(...)
 file2.write('{};{}{}'.format(key, data[key], os.linesep))
(...)
 file2.write('{};{}{}'.format(key, value + sum(previous_values), os.linesep))

As a sidenote this is an example of how you could read your CSV without the need for the CSV module:

 data = [i.split(";") for i in open('MonthData1.csv').read().split('\n')]

If you had a more complex CSV file, especially if it had strings that could have semi-colons within, you'd better go for the CSV module.

The pandas library, mentioned in other answers is a great tool. It will most certainly be able to handle any need you might have to deal with CSV data.

I can't open the csv file after running your code, it says that the file is being used. I used the windows code, as I'm a windows user
I forgot to close the files. Perhaps that is the issue. You may need to close the software you are using to work with python and open it again to release the file. I updated the code to include the closing method.
Yeah I figured it out too after commenting:D One edit if it's possible, could the first three be in a row after each other, and the cumulative too? Now there is a extra row after each date. If you don't understand me, I'd love to have it the same as the picture in the OP, if this is possible
Sure. I edited the script in the answer for that. It is simply a matter of writing a blank line '\n' or '\r\n' in the file2. That is, file2.write('\r\n')
Omg! I have tried to search for this solution for so long! I am so happy now, thank you, thank you a million times!

CollectivesTM on Stack Overflow

Python formatting data to csv file

3 Answers 3

5 Comments

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

5 Comments

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related