Split data from single CSV file into several CSV files by column value

Question 1

Okay...so I have been self teaching for about seven months, this past week someone in accounting at work said that it'd be nice if she could get reports split up....so I made that my first program because I couldn't ever come up with something useful to try to make...all that said, I got it finished last night, and it does what I was expecting it to do so far, but I'm sure there are things that have a better way of being done.

I wanted to get it finished without help first, but now I'd like someone to take a look and tell me what I could have done better, or if there is a better way to go about getting the same results....

What this is doing is: it opens a CSV file (the file I've been practicing with has 27K lines of data) and it loops through, creating a separate file for each billing number, using the billing number as the filename, and writing the header as the first line. Each instance is overwriting the file if it is already created, this is an area I'm sure I could have done better. After that, it loops through the data again, appending each line of data into the correct file.

import os
import csv
#currentdirpath = os.getcwd()
#filename = 'argos.csv'
#file_path = os.path.join(os.getcwd(), filename) #filepath to open
def get_file_path(filename):
 ''' - This gets the full path...file and terminal need to be in 
same directory - '''
 file_path = os.path.join(os.getcwd(), filename)
 return file_path
pathOfFile = get_file_path('argos.csv')
''' - Below opens and reads the csv file, 
 then going to try to loop and write the rows out in files sorted 
by Billing Number - '''
with open(pathOfFile, 'rU') as csvfile:
 reader = csv.reader(csvfile)
 header = next(reader)
 for row in reader:
 new_file_name = row[5][:5] + '.csv'
 ''' Create file named by billing number, and print the header to 
each file '''
 fb = open(new_file_name, 'w+')
 fb.write(str(header) + '\n')
 #fb.close()
with open(pathOfFile, 'rU') as csvfile:
 reader = csv.reader(csvfile)
 for row in reader:
 new_file_name = row[5][:5] + '.csv'
 ab = open(new_file_name, 'a')
 ab.write(str(row) + '\n')

I've left a few of the things in there that I had at one point, but commented out...just thought it might give you a better idea of what I was thinking...any advice is appreciated!

Question 2

Please verify your indentation: ''' Create file named by billing number, and print the header to each file ''' looks illegal.

Question 3

Sorry, I apparently don't get emails for replies....really those comments were all just for me only so I could start and stop on this without having to figure out where I was each time I started it, I was looking more for if I went about getting the result the best way....by illegal, I'm hoping you mean code-wise....

Question 4

Don't use string literals as comments. PEP 8 explains how to use comments.
Docstrings should use """ rather than ''', as described in PEP 257. Also your docstring doesn't need the "-", and should probably be rephrased slightly to fit on one line.
Close files, #fb.close() shows you went out of your way to make bad code. Without fb.close or wrapping open in a with, the file is not guaranteed to be closed. I personally prefer with to fb.close, as described here.
Personally, rather than over-riding your files \$n\$ times, I'd use collections.defaultdict, to group all your files into their rows.
You may want to change get_file_path, to be based of __file__. Or leave your path to be relative, as it'll default to that behavior.

import os
import csv
from collections import defaultdict
FILE_DIR = os.path.dirname(os.path.abspath(__file__))
def get_file_path(filename):
 return os.path.join(FILE_DIR, filename)
file_path = get_file_path('argos.csv')
with open(file_path, 'rU') as csvfile:
 reader = csv.reader(csvfile)
 header = next(reader)
 data = defaultdict(lambda:[header])
 _ = data[header[5][:5]]
 for row in reader: 
 data[row[5][:5]].append(row)
 for file_name, rows in data.items():
 with open(file_name, 'w+') as f:
 for row in rows:
 f.write(str(row) + '\n')

Peilonrayz ♦Peilonrayz 44.4k7 gold badges80 silver badges157 bronze badges · Accepted Answer · 2017-07-17 09:27:15Z

Don't use string literals as comments. PEP 8 explains how to use comments.
Docstrings should use """ rather than ''', as described in PEP 257. Also your docstring doesn't need the "-", and should probably be rephrased slightly to fit on one line.
Close files, #fb.close() shows you went out of your way to make bad code. Without fb.close or wrapping open in a with, the file is not guaranteed to be closed. I personally prefer with to fb.close, as described here.
Personally, rather than over-riding your files \$n\$ times, I'd use collections.defaultdict, to group all your files into their rows.
You may want to change get_file_path, to be based of __file__. Or leave your path to be relative, as it'll default to that behavior.

import os
import csv
from collections import defaultdict
FILE_DIR = os.path.dirname(os.path.abspath(__file__))
def get_file_path(filename):
 return os.path.join(FILE_DIR, filename)
file_path = get_file_path('argos.csv')
with open(file_path, 'rU') as csvfile:
 reader = csv.reader(csvfile)
 header = next(reader)
 data = defaultdict(lambda:[header])
 _ = data[header[5][:5]]
 for row in reader: 
 data[row[5][:5]].append(row)
 for file_name, rows in data.items():
 with open(file_name, 'w+') as f:
 for row in rows:
 f.write(str(row) + '\n')

Stack Exchange Network

Split data from single CSV file into several CSV files by column value

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Split data from single CSV file into several CSV files by column value

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions