Convert all CSV files in a given directory to JSON using Python

Question 1

I am attempting to convert all files with the csv extension in a given directory to json with this python script.

I am wondering if there is a better and more efficient way to do this?

Here is my code:

import csv
import json
import glob
import os
for filename in glob.glob('//path/to/file/*.csv'):
 csvfile = os.path.splitext(filename)[0]
 jsonfile = csvfile + '.json'
 with open(csvfile+'.csv') as f:
 reader = csv.DictReader(f)
 rows = list(reader)
 with open(jsonfile, 'w') as f:
 json.dump(rows, f)

EDIT:

Here is the sample input:

CSV Data Example

Here is the sample output:

[{
"username": "lanky",
"user_id": "4",
"firstname": "Joan",
"middlename": "Agetha",
"lastname": "Lanke",
"age": "36",
"usertype": "admin",
"email": "[email protected]"
}, {
"username": "masp",
"user_id": "56",
"firstname": "Mark",
"middlename": "Patrick",
"lastname": "Aspir",
"age": "25",
"usertype": "member",
"email": "[email protected]"
}]

Question 2

Can you tell us the reason behind which you get rid of the first row of each CSV file (rows.pop(0))?

Question 3

I only realised that I left that code in there my bad. It was making a JSON record for the headers in a previous version of the code I wrote so I was removing it.

Question 4

Looks good to me. It's a perfectly sensible approach.

There's just one line I'm going to criticize:

 csvfile = os.path.splitext(filename)[0]

Picking off element zero is fine, no need to change it. Other idiomatic approaches would be name, ext = ..., or an explicit discard of name, _ = ....

What I really do advocate changing is the identifier csvfile, since the .csv extension has been stripped from it. Better to simply call it name. Then an expression like name + '.csv' can be clearly seen as a csvfile expression.

You could elide the jsonfile assignment, if you like, and similarly rows, replacing each reference with the expression.

Question 5

Going further... what is the point of keeping name + '.csv'? Using filename instead would be fine too. This also can lead to changing the filename variable name to csvfile.

Question 6

Explicit imports

I am not a fan of importing whole packages and modules, if you just need some few functions or classes from them:

import csv
import json
import glob
import os

This has two reasons:

If you explicitely import the functions and classes you can directly see what members of the packages / modules are used in your code.
You don't need to write out the whole namespace each time you use a member "deep" in the package.

Also you may want to order your standard library import alphabetically:

from csv import DictReader
from glob import glob
from json import dump
from os.path import splitext

Misleading names

You use the iteration variable filename which represents your actual CSV files.
Later you set a variable csvfile which no longer describes the actual CSV file, but its stem.

for csvfile in glob('//path/to/file/*.csv'):
 stem, _ = splitext(csvfile)
 jsonfile = stem + '.json'

Reduced verbosity

You can put the read and write operations on the two files into one common context.
You can also get rid of names that you only use once, if it does not impact code clarity.

with open(csvfile) as csv, open(jsonfile, 'w') as json:
 dump(list(DictReader(csv)), json)

Check for `main`

Finally, consider putting your running code into an

if __name__ == '__main__':
 ...

block, so that it is not executed if you import the module.

Question 7

You did the right decision by coding this line: rows = list(reader) because that way, you are making a proper use of I/O buffering as you are performing one system call for each JSON file. What I mean is that, instead of looping over the rows of each CSV file and write them one by one to the corresponding JSON files, you found a simple yet efficient way to write everything at once.

My contribution is not about suggesting you something new, but to provide you confidence in what you have already intelligently implemented.

Question 8

I agree that this is good practice for small and medium size files. However, on huge files, expanding the whole file into a list might raise you a MemoryError.

Question 9

codereview.stackexchange.com/questions/176862/…

J_H J_HJ_H 41.4k3 gold badges38 silver badges157 bronze badges · Accepted Answer · 2017-11-13 23:07:03Z

Looks good to me. It's a perfectly sensible approach.

There's just one line I'm going to criticize:

 csvfile = os.path.splitext(filename)[0]

Picking off element zero is fine, no need to change it. Other idiomatic approaches would be name, ext = ..., or an explicit discard of name, _ = ....

What I really do advocate changing is the identifier csvfile, since the .csv extension has been stripped from it. Better to simply call it name. Then an expression like name + '.csv' can be clearly seen as a csvfile expression.

You could elide the jsonfile assignment, if you like, and similarly rows, replacing each reference with the expression.

Going further... what is the point of keeping name + '.csv'? Using filename instead would be fine too. This also can lead to changing the filename variable name to csvfile.

Stack Exchange Network

Convert all CSV files in a given directory to JSON using Python

3 Answers 3

Explicit imports

Misleading names

Reduced verbosity

Check for `main`

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Convert all CSV files in a given directory to JSON using Python

3 Answers 3

Explicit imports

Misleading names

Reduced verbosity

Check for __main__

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions

Check for `main`