I have made CSV file slicer. This is my first and biggest piece of code on Python. It takes one .csv
file from current folder and then slices it to n parts and adds a first column, if provided.
import csv
import math
import os
import re
import sys
# Reading and returning files in current directory
def read_files():
__location__ = os.path.realpath(
os.path.join(os.getcwd(), os.path.dirname(__file__)))
return [f for f in os.listdir(__location__) if os.path.isfile(f)]
# Removing files which match certain pattern
def remove_old_files():
for f in read_files():
if re.search("^[0-9]+\.csv$", str(f)) is not None:
os.remove(f)
# Getting file to split in current directory
def get_file_to_split():
for f in read_files():
if re.search(".*\.csv$", str(f)) is not None:
split(f, int(sys.argv[1]))
# Split file into n pieces
def split(csv_file, pieces):
first_col = None
if len(sys.argv) > 2:
first_col = sys.argv[2]
with open(csv_file, 'r') as c:
reader = csv.reader(c)
data = list(reader)
cols_to_write = math.ceil(data.__len__() / pieces)
chunks = [data[x:x + cols_to_write] for x in range(0, len(data), cols_to_write)]
for num_file in range(pieces):
filename = str(num_file) + ".csv"
with open(filename, 'w') as f:
w = csv.writer(f)
for i in range(cols_to_write):
try:
if first_col is not None and i == 0:
w.writerow([first_col])
w.writerow(chunks[num_file][i])
except IndexError:
pass
print("Done")
if __name__ == "__main__":
if int(sys.argv[1]) <= 0:
raise SystemExit("Piece count must be natural number greater than zero.")
remove_old_files()
get_file_to_split()
1 Answer 1
Here are some of the things I've noticed:
- switching to
argparse
might make the argument parsing a bit more readable the
read_files()
+remove_old_files()
functions could make use ofglob
module with the**
+recursive
mode:for filename in glob.iglob('./**/[0-9]+.csv', recursive=True): os.remove(filename)
- avoid calling "magic" methods like
__len__()
when not necessary - you can uselen()
function directly you can define
first_col
in one line:first_col = sys.argv[2] if len(sys.argv) > 2 else None
c
andf
are not good variable names, think of something more descriptive -input_file
andoutput_file
?..- you can use an
f-string
to define the "filename" for a chunk - move the comments before the functions into proper docstrings
Also, what if you would slice the CSV in an iterative manner, something along these lines (other improvements applied):
import csv
import glob
import os
import sys
from itertools import islice
def remove_old_files():
"""Removing files which match certain pattern."""
for filename in glob.iglob('./**/[0-9]+.csv', recursive=True):
os.remove(filename)
def chunks(it, size):
it = iter(it)
return iter(lambda: tuple(islice(it, size)), ())
def split(csv_file, number_of_slices, first_column):
"""Split file into number_of_slices pieces."""
with open(csv_file, 'r') as input_file:
reader = csv.reader(input_file)
for num_file, chunk in enumerate(chunks(reader, number_of_slices)):
with open(f"{num_file}.csv", 'w') as output_file:
writer = csv.writer(output_file)
if first_column:
for row in chunk:
writer.writerow([first_column] + row)
else:
writer.writerows(chunk)
if __name__ == "__main__":
# TODO: argparse?
if int(sys.argv[1]) <= 0:
raise SystemExit("Piece count must be natural number greater than zero.")
number_of_slices = int(sys.argv[1])
first_column = sys.argv[2] if len(sys.argv) > 2 else None
remove_old_files()
for filename in glob.iglob('./**/*.csv', recursive=True):
split(filename, number_of_slices, first_column)
-
\$\begingroup\$ I personally prefer docopt, but either is good. \$\endgroup\$Oscar Smith– Oscar Smith2017年10月16日 16:13:03 +00:00Commented Oct 16, 2017 at 16:13
-
\$\begingroup\$ Thank You for suggestions. Now I'm applying them into code, but got some issues with glob. It returns 'path' as a result
os.remove()
does not remove files. Additionally'./**/[0-9]+.csv'
matches all files, which will possibly remove all .csv files into current directory. \$\endgroup\$Katka– Katka2017年10月17日 07:15:25 +00:00Commented Oct 17, 2017 at 7:15 -
\$\begingroup\$ Said wrong in comment. Pattern does not match any of files, that's why it does not remove any of them. \$\endgroup\$Katka– Katka2017年10月17日 07:30:19 +00:00Commented Oct 17, 2017 at 7:30
-
\$\begingroup\$ @eddga alright, quick check - what Python version are you using? Thanks. \$\endgroup\$alecxe– alecxe2017年10月17日 11:44:30 +00:00Commented Oct 17, 2017 at 11:44
-
\$\begingroup\$ @alecxe I tried to run this on Python 3.6.3 and 3.5.0 \$\endgroup\$Katka– Katka2017年10月17日 11:59:24 +00:00Commented Oct 17, 2017 at 11:59