This script tries to parse .csv
files and returns an iterable object to fetch one row at a time to do some process. It works fine but I think it needs improvements to make it efficient.
import os
from csv import DictReader
class CSVHandler:
DELIMIT = ',' # delimiter for csv
FIELDS = set(list(['foo', 'bar'])) # fields that I want
def __init__(self, file_path: str):
self.file_path = file_path
self.file_obj = None
self.reader = DictReader([])
def read_file(self):
if self._is_file_ok:
try:
self.file_obj = open(self.file_path, newline='')
self.reader = DictReader(self.file_obj)
unmatched = self._is_fields_ok
if isinstance(unmatched, set):
print(f"Warning : field set's unmatched! {unmatched}")
except IOError:
print(f'Unable to open/read file : "{self.file_path}"')
else:
print(f'Invalid file : "{self.file_path}"')
return self
@property
def _is_file_ok(self):
if os.path.isfile(self.file_path):
if os.path.exists(self.file_path):
if self.file_path.endswith('.csv'):
return True
return False
@property
def _is_fields_ok(self):
if self.reader or not self.FIELDS:
return self.FIELDS - set(self.reader.fieldnames)
return False
def _trim_row(self, row: dict):
trimmed_row = dict.fromkeys(self.FIELDS, None)
for field in self.FIELDS:
if field in row:
trimmed_row[field] = row[field]
return trimmed_row
def __enter__(self):
return self
def __iter__(self):
return (self._trim_row(dict(row)) for row in self.reader)
def __next__(self):
return self._trim_row(dict(next(self.reader)))
def __len__(self):
return len(list(self.reader))
def __exit__(self, exc_type, exc_val, exc_tb):
if self.file_obj is not None:
self.file_obj.close()
if __name__ == '__main__':
with CSVHandler('file_name.csv').read_file() as csv_h:
for rows in csv_h:
pass
```
-
\$\begingroup\$ Is using pandas an option? \$\endgroup\$Mast– Mast ♦2020年03月07日 10:40:38 +00:00Commented Mar 7, 2020 at 10:40
-
2\$\begingroup\$ Please consider adding your real code and telling us more about what it's doing and why you wrote it. \$\endgroup\$Mast– Mast ♦2020年03月07日 10:41:35 +00:00Commented Mar 7, 2020 at 10:41
-
\$\begingroup\$ Could you also remove the request for help in your title. Asking for code imporvment is implied in all questions. \$\endgroup\$user211881– user2118812020年03月07日 13:23:30 +00:00Commented Mar 7, 2020 at 13:23
2 Answers 2
Welcome to CR!
As others recommended, you need to be more clear. I can give you some stylistic critique.
I like that you are using type hinting in the parameters. I recommend you also hint the return types for your functions like:
def _trim_row(self, row: dict) -> list:
trimmed_row = dict.fromkeys(self.FIELDS, None)
for field in self.FIELDS:
if field in row:
trimmed_row[field] = row[field]
return trimmed_row
Now, more importantly I recommend you create strong documentation for your code so that we can understand the idea behind it, what bugs it has, as well as its efficiency failures (you can use codetags such as # BUG:
or # FIXME:
).
For your convenience, you can also edit your code in an IDE that enables pylinting, where you can get suggestions about how stylistically good your code is, I recommend VS Code with the Python and autoDocstring extensions.
By putting your code in VS Code, I got the following suggestions for improvements:
""" [INSERT MODULE DESCRIPTION] """
import os
from csv import DictReader
class CSVHandler:
"""[INSERT CLASS DESCRIPTION]
"""
DELIMIT = ',' # delimiter for csv
FIELDS = set(list(['foo', 'bar'])) # fields that I want
def __init__(self, file_path: str):
"""[summary]
Arguments:
file_path {str} -- [description]
"""
self.file_path = file_path
self.file_obj = None
self.reader = DictReader([])
def read_file(self):
"""[summary]
Returns:
[type] -- [description]
"""
if self._is_file_ok:
try:
self.file_obj = open(self.file_path, newline='')
self.reader = DictReader(self.file_obj)
unmatched = self._is_fields_ok
if isinstance(unmatched, set):
print(f"Warning : field set's unmatched! {unmatched}")
except IOError:
print(f'Unable to open/read file : "{self.file_path}"')
else:
print(f'Invalid file : "{self.file_path}"')
return self
@property
def _is_file_ok(self):
"""[summary]
Returns:
[type] -- [description]
"""
if os.path.isfile(self.file_path):
if os.path.exists(self.file_path):
if self.file_path.endswith('.csv'):
return True
return False
@property
def _is_fields_ok(self):
"""[summary]
Returns:
[type] -- [description]
"""
if self.reader or not self.FIELDS:
return self.FIELDS - set(self.reader.fieldnames)
return False
def _trim_row(self, row: dict):
"""[summary]
Arguments:
row {dict} -- [description]
Returns:
[type] -- [description]
"""
trimmed_row = dict.fromkeys(self.FIELDS, None)
for field in self.FIELDS:
if field in row:
trimmed_row[field] = row[field]
return trimmed_row
def __enter__(self):
return self
def __iter__(self):
return (self._trim_row(dict(row)) for row in self.reader)
def __next__(self):
return self._trim_row(dict(next(self.reader)))
def __len__(self):
return len(list(self.reader))
def __exit__(self, exc_type, exc_val, exc_tb):
if self.file_obj is not None:
self.file_obj.close()
if __name__ == '__main__':
with CSVHandler('file_name.csv').read_file() as csv_h:
for rows in csv_h:
pass
Another suggestion that I can give you is to know what you are importing and import just what you need. So, import os
should be replaced with from os import path
and remove os.
from if os.path.isfile(self.file_path):
and if os.path.exists(self.file_path):
.
Following this, you will understand your code better, and you will be able to ask more specific questions about it.
In a fully-fledged class, DELIMIT
and FIELDS
could have been defined as parameters rather than left hardcoded like this:
DELIMIT = ',' # delimiter for csv
FIELDS = set(list(['foo', 'bar'])) # fields that I want
Regarding exception handling, I recommend that you keep the full details including the stack trace - you don't have to display it to the user but it's good to have a log somewhere to investigate errors. Debugging can be difficult if you have to rely only on what your program outputs. For example, if an except IOError
is raised it will not be immediately apparent what happened: was it a permission issue or the file does not exist or something else ?
Using the logger
module you can output messages to multiple destinations at the same time, for example show a simplified message on the console but log more details to a file, also the logging levels can be different for each destination.
In this function os.path.exists
is redundant, isfile
will suffice:
def _is_file_ok(self):
if os.path.isfile(self.file_path):
if os.path.exists(self.file_path):
if self.file_path.endswith('.csv'):
return True
return False
From the doc:
os.path.isfile(path)
Return True if path is an existing regular file. This follows symbolic links, so both islink() and isfile() can be true for the same path.
isfile
and endswith
are boolean functions, so you can directly return the result of the function. In a slightly more concise way:
def _is_file_ok(self):
if os.path.isfile(self.file_path):
return self.file_path.endswith('.csv')
else:
return False
On a final note, I also agree that Pandas would do the job fine with much less code.
-
\$\begingroup\$ I meant to ask if this approach is right to read .csv file and return an iterable. If an error occurs in any part of the script it should log and return an empty iterable \$\endgroup\$Gautham– Gautham2020年03月09日 04:15:00 +00:00Commented Mar 9, 2020 at 4:15