\$\begingroup\$
\$\endgroup\$
I needed to make a small script to read a csv file and store 2 columns in Redis. Can you guys take a look over the code and let me know how I can improve my code style?
#!/usr/bin/env/ python
"""Proces csv file and store key-value pair of columns in redis"""
import csv, redis, json
from sys import argv, exit
REDIS_HOST = 'localhost'
conn = redis.Redis(REDIS_HOST)
def get_csv_data(csv_file, **columns):
col1, col2 = columns['columns']
with open(csv_file, encoding='utf-8') as csvf:
csv_data = csv.reader(csvf)
return [(i[col1], i[col2]) for i in csv_data]
def store_data(data):
for i in data:
if i[1] not in conn:
conn.set(i[0], i[1])
return data
def main():
if len(argv) < 2:
exit("usage: python {file} file.csv [column1, column2]".format(file=__file__))
csv_file = argv[1]
columns = list(map(int,argv[2:4])) or [1, 3]
data = get_csv_data(csv_file, columns=columns)
print (json.dumps(store_data(data)))
if __name__ == '__main__':
main()
asked Nov 1, 2014 at 18:41
1 Answer 1
\$\begingroup\$
\$\endgroup\$
Here are my suggestions:
import csv, redis, json
import sys
REDIS_HOST = 'localhost'
def read_csv_data(csv_file, ik, iv):
with open(csv_file, encoding='utf-8') as csvf:
csv_data = csv.reader(csvf)
return [(r[ik], r[iv]) for r in csv_data]
def store_data(conn, data):
for i in data:
conn.setnx(i[0], i[1])
return data
def main():
if len(sys.argv) < 2:
sys.exit(
"Usage: %s file.csv [key_column_index, value_column_index]"
% __file__
)
columns = (0, 1) if len(sys.argv) < 4 else (int(x) for x in sys.argv[2:4])
data = read_csv_data(sys.argv[1], *columns)
conn = redis.Redis(REDIS_HOST)
print (json.dumps(store_data(conn, data)))
if '__main__' == __name__:
main()
Feel free to ask questions on any of the changes. Some notes:
- In general, I don't like
from x import y
. It clutters your global namespace and conflicts can become an issue. - In your original code, I guess you meant
if i[0] not in conn
, to test whether the key already exists, not the value right? That being the case, it's better to use the built in SETNX feature in Redis. - I changed your
**kwargs
to*args
. Passing a dictionary containing a single tuple was a bit confusing. - Your
list(map(...)) or
was pretty clever, but I find this way more readable. Just personal preference. Tuple is preferable to list for immutable data. - I only initialize the connection after parsing the command line arguments and loading the CSV file. In general, only constants should be module scope.
- I swapped
__name__
and'__main__'
. I prefer putting the constant term in a comparison first. That way the interpreter will catch the sometimes hard to spot typo==
->=
lang-py