I am attempting to convert all files with the csv extension in a given directory to json with this python script.
I am wondering if there is a better and more efficient way to do this?
Here is my code:
import csv
import json
import glob
import os
for filename in glob.glob('//path/to/file/*.csv'):
csvfile = os.path.splitext(filename)[0]
jsonfile = csvfile + '.json'
with open(csvfile+'.csv') as f:
reader = csv.DictReader(f)
rows = list(reader)
with open(jsonfile, 'w') as f:
json.dump(rows, f)
EDIT:
Here is the sample input:
Here is the sample output:
[{
"username": "lanky",
"user_id": "4",
"firstname": "Joan",
"middlename": "Agetha",
"lastname": "Lanke",
"age": "36",
"usertype": "admin",
"email": "[email protected]"
}, {
"username": "masp",
"user_id": "56",
"firstname": "Mark",
"middlename": "Patrick",
"lastname": "Aspir",
"age": "25",
"usertype": "member",
"email": "[email protected]"
}]
3 Answers 3
Looks good to me. It's a perfectly sensible approach.
There's just one line I'm going to criticize:
csvfile = os.path.splitext(filename)[0]
Picking off element zero is fine, no need to change it. Other idiomatic approaches would be name, ext = ...
, or an explicit discard of name, _ = ...
.
What I really do advocate changing is the identifier csvfile
, since the .csv extension has been stripped from it. Better to simply call it name
. Then an expression like name + '.csv'
can be clearly seen as a csvfile expression.
You could elide the jsonfile
assignment, if you like, and similarly rows
, replacing each reference with the expression.
-
3\$\begingroup\$ Going further... what is the point of keeping
name + '.csv'
? Usingfilename
instead would be fine too. This also can lead to changing thefilename
variable name tocsvfile
. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2017年11月14日 08:33:18 +00:00Commented Nov 14, 2017 at 8:33
Explicit imports
I am not a fan of importing whole packages and modules, if you just need some few functions or classes from them:
import csv
import json
import glob
import os
This has two reasons:
- If you explicitely import the functions and classes you can directly see what members of the packages / modules are used in your code.
- You don't need to write out the whole namespace each time you use a member "deep" in the package.
Also you may want to order your standard library import alphabetically:
from csv import DictReader
from glob import glob
from json import dump
from os.path import splitext
Misleading names
You use the iteration variable filename
which represents your actual CSV files.
Later you set a variable csvfile
which no longer describes the actual CSV file, but its stem.
for csvfile in glob('//path/to/file/*.csv'):
stem, _ = splitext(csvfile)
jsonfile = stem + '.json'
Reduced verbosity
You can put the read and write operations on the two files into one common context.
You can also get rid of names that you only use once, if it does not impact code clarity.
with open(csvfile) as csv, open(jsonfile, 'w') as json:
dump(list(DictReader(csv)), json)
Check for __main__
Finally, consider putting your running code into an
if __name__ == '__main__':
...
block, so that it is not executed if you import the module.
You did the right decision by coding this line: rows = list(reader)
because that way, you are making a proper use of I/O buffering as you are performing one system call for each JSON file. What I mean is that, instead of looping over the rows of each CSV file and write them one by one to the corresponding JSON files, you found a simple yet efficient way to write everything at once.
My contribution is not about suggesting you something new, but to provide you confidence in what you have already intelligently implemented.
-
2\$\begingroup\$ I agree that this is good practice for small and medium size files. However, on huge files, expanding the whole file into a list might raise you a
MemoryError
. \$\endgroup\$Richard Neumann– Richard Neumann2017年11月14日 14:15:58 +00:00Commented Nov 14, 2017 at 14:15 -
\$\begingroup\$ codereview.stackexchange.com/questions/176862/… \$\endgroup\$Billal BEGUERADJ– Billal BEGUERADJ2017年11月14日 17:51:07 +00:00Commented Nov 14, 2017 at 17:51
rows.pop(0)
)? \$\endgroup\$