Read .csv files and convert the rows into dict for looping

Question 1

This script tries to parse .csv files and returns an iterable object to fetch one row at a time to do some process. It works fine but I think it needs improvements to make it efficient.

 import os
 from csv import DictReader
 class CSVHandler:
 DELIMIT = ',' # delimiter for csv
 FIELDS = set(list(['foo', 'bar'])) # fields that I want
 def __init__(self, file_path: str):
 self.file_path = file_path
 self.file_obj = None
 self.reader = DictReader([])
 def read_file(self):
 if self._is_file_ok:
 try:
 self.file_obj = open(self.file_path, newline='')
 self.reader = DictReader(self.file_obj)
 unmatched = self._is_fields_ok
 if isinstance(unmatched, set):
 print(f"Warning : field set's unmatched! {unmatched}")
 except IOError:
 print(f'Unable to open/read file : "{self.file_path}"')
 else:
 print(f'Invalid file : "{self.file_path}"')
 return self
 @property
 def _is_file_ok(self):
 if os.path.isfile(self.file_path):
 if os.path.exists(self.file_path):
 if self.file_path.endswith('.csv'):
 return True
 return False
 @property
 def _is_fields_ok(self):
 if self.reader or not self.FIELDS:
 return self.FIELDS - set(self.reader.fieldnames)
 return False
 def _trim_row(self, row: dict):
 trimmed_row = dict.fromkeys(self.FIELDS, None)
 for field in self.FIELDS:
 if field in row:
 trimmed_row[field] = row[field]
 return trimmed_row
 def __enter__(self):
 return self
 def __iter__(self):
 return (self._trim_row(dict(row)) for row in self.reader)
 def __next__(self):
 return self._trim_row(dict(next(self.reader)))
 def __len__(self):
 return len(list(self.reader))
 def __exit__(self, exc_type, exc_val, exc_tb):
 if self.file_obj is not None:
 self.file_obj.close()
 if __name__ == '__main__':
 with CSVHandler('file_name.csv').read_file() as csv_h:
 for rows in csv_h:
 pass
```

Question 2

Is using pandas an option?

Question 3

Please consider adding your real code and telling us more about what it's doing and why you wrote it.

Question 4

Could you also remove the request for help in your title. Asking for code imporvment is implied in all questions.

Question 5

Welcome to CR!

As others recommended, you need to be more clear. I can give you some stylistic critique.

I like that you are using type hinting in the parameters. I recommend you also hint the return types for your functions like:

def _trim_row(self, row: dict) -> list:
 trimmed_row = dict.fromkeys(self.FIELDS, None)
 for field in self.FIELDS:
 if field in row:
 trimmed_row[field] = row[field]
 return trimmed_row

Now, more importantly I recommend you create strong documentation for your code so that we can understand the idea behind it, what bugs it has, as well as its efficiency failures (you can use codetags such as # BUG: or # FIXME:).

For your convenience, you can also edit your code in an IDE that enables pylinting, where you can get suggestions about how stylistically good your code is, I recommend VS Code with the Python and autoDocstring extensions.

By putting your code in VS Code, I got the following suggestions for improvements:

""" [INSERT MODULE DESCRIPTION] """
import os
from csv import DictReader
class CSVHandler:
 """[INSERT CLASS DESCRIPTION]
 """
 DELIMIT = ',' # delimiter for csv
 FIELDS = set(list(['foo', 'bar'])) # fields that I want
 def __init__(self, file_path: str):
 """[summary]
 Arguments:
 file_path {str} -- [description]
 """
 self.file_path = file_path
 self.file_obj = None
 self.reader = DictReader([])
 def read_file(self):
 """[summary]
 Returns:
 [type] -- [description]
 """
 if self._is_file_ok:
 try:
 self.file_obj = open(self.file_path, newline='')
 self.reader = DictReader(self.file_obj)
 unmatched = self._is_fields_ok
 if isinstance(unmatched, set):
 print(f"Warning : field set's unmatched! {unmatched}")
 except IOError:
 print(f'Unable to open/read file : "{self.file_path}"')
 else:
 print(f'Invalid file : "{self.file_path}"')
 return self
 @property
 def _is_file_ok(self):
 """[summary]
 Returns:
 [type] -- [description]
 """
 if os.path.isfile(self.file_path):
 if os.path.exists(self.file_path):
 if self.file_path.endswith('.csv'):
 return True
 return False
 @property
 def _is_fields_ok(self):
 """[summary]
 Returns:
 [type] -- [description]
 """
 if self.reader or not self.FIELDS:
 return self.FIELDS - set(self.reader.fieldnames)
 return False
 def _trim_row(self, row: dict):
 """[summary]
 Arguments:
 row {dict} -- [description]
 Returns:
 [type] -- [description]
 """
 trimmed_row = dict.fromkeys(self.FIELDS, None)
 for field in self.FIELDS:
 if field in row:
 trimmed_row[field] = row[field]
 return trimmed_row
 def __enter__(self):
 return self
 def __iter__(self):
 return (self._trim_row(dict(row)) for row in self.reader)
 def __next__(self):
 return self._trim_row(dict(next(self.reader)))
 def __len__(self):
 return len(list(self.reader))
 def __exit__(self, exc_type, exc_val, exc_tb):
 if self.file_obj is not None:
 self.file_obj.close()
if __name__ == '__main__':
 with CSVHandler('file_name.csv').read_file() as csv_h:
 for rows in csv_h:
 pass

Another suggestion that I can give you is to know what you are importing and import just what you need. So, import os should be replaced with from os import path and remove os. from if os.path.isfile(self.file_path): and if os.path.exists(self.file_path):.

Following this, you will understand your code better, and you will be able to ask more specific questions about it.

Question 6

In a fully-fledged class, DELIMIT and FIELDS could have been defined as parameters rather than left hardcoded like this:

DELIMIT = ',' # delimiter for csv
FIELDS = set(list(['foo', 'bar'])) # fields that I want

Regarding exception handling, I recommend that you keep the full details including the stack trace - you don't have to display it to the user but it's good to have a log somewhere to investigate errors. Debugging can be difficult if you have to rely only on what your program outputs. For example, if an except IOError is raised it will not be immediately apparent what happened: was it a permission issue or the file does not exist or something else ?

Using the logger module you can output messages to multiple destinations at the same time, for example show a simplified message on the console but log more details to a file, also the logging levels can be different for each destination.

In this function os.path.exists is redundant, isfile will suffice:

def _is_file_ok(self):
 if os.path.isfile(self.file_path):
 if os.path.exists(self.file_path):
 if self.file_path.endswith('.csv'):
 return True
 return False

From the doc:

os.path.isfile(path)

Return True if path is an existing regular file. This follows symbolic links, so both islink() and isfile() can be true for the same path.

isfile and endswith are boolean functions, so you can directly return the result of the function. In a slightly more concise way:

def _is_file_ok(self):
 if os.path.isfile(self.file_path):
 return self.file_path.endswith('.csv')
 else:
 return False

On a final note, I also agree that Pandas would do the job fine with much less code.

Question 7

I meant to ask if this approach is right to read .csv file and return an iterable. If an error occurs in any part of the script it should log and return an empty iterable

oamandawi oamandawi 1416 bronze badges · Answer 1 · 2020-03-08 01:28:25Z

Welcome to CR!

As others recommended, you need to be more clear. I can give you some stylistic critique.

I like that you are using type hinting in the parameters. I recommend you also hint the return types for your functions like:

def _trim_row(self, row: dict) -> list:
 trimmed_row = dict.fromkeys(self.FIELDS, None)
 for field in self.FIELDS:
 if field in row:
 trimmed_row[field] = row[field]
 return trimmed_row

Now, more importantly I recommend you create strong documentation for your code so that we can understand the idea behind it, what bugs it has, as well as its efficiency failures (you can use codetags such as # BUG: or # FIXME:).

For your convenience, you can also edit your code in an IDE that enables pylinting, where you can get suggestions about how stylistically good your code is, I recommend VS Code with the Python and autoDocstring extensions.

By putting your code in VS Code, I got the following suggestions for improvements:

""" [INSERT MODULE DESCRIPTION] """
import os
from csv import DictReader
class CSVHandler:
 """[INSERT CLASS DESCRIPTION]
 """
 DELIMIT = ',' # delimiter for csv
 FIELDS = set(list(['foo', 'bar'])) # fields that I want
 def __init__(self, file_path: str):
 """[summary]
 Arguments:
 file_path {str} -- [description]
 """
 self.file_path = file_path
 self.file_obj = None
 self.reader = DictReader([])
 def read_file(self):
 """[summary]
 Returns:
 [type] -- [description]
 """
 if self._is_file_ok:
 try:
 self.file_obj = open(self.file_path, newline='')
 self.reader = DictReader(self.file_obj)
 unmatched = self._is_fields_ok
 if isinstance(unmatched, set):
 print(f"Warning : field set's unmatched! {unmatched}")
 except IOError:
 print(f'Unable to open/read file : "{self.file_path}"')
 else:
 print(f'Invalid file : "{self.file_path}"')
 return self
 @property
 def _is_file_ok(self):
 """[summary]
 Returns:
 [type] -- [description]
 """
 if os.path.isfile(self.file_path):
 if os.path.exists(self.file_path):
 if self.file_path.endswith('.csv'):
 return True
 return False
 @property
 def _is_fields_ok(self):
 """[summary]
 Returns:
 [type] -- [description]
 """
 if self.reader or not self.FIELDS:
 return self.FIELDS - set(self.reader.fieldnames)
 return False
 def _trim_row(self, row: dict):
 """[summary]
 Arguments:
 row {dict} -- [description]
 Returns:
 [type] -- [description]
 """
 trimmed_row = dict.fromkeys(self.FIELDS, None)
 for field in self.FIELDS:
 if field in row:
 trimmed_row[field] = row[field]
 return trimmed_row
 def __enter__(self):
 return self
 def __iter__(self):
 return (self._trim_row(dict(row)) for row in self.reader)
 def __next__(self):
 return self._trim_row(dict(next(self.reader)))
 def __len__(self):
 return len(list(self.reader))
 def __exit__(self, exc_type, exc_val, exc_tb):
 if self.file_obj is not None:
 self.file_obj.close()
if __name__ == '__main__':
 with CSVHandler('file_name.csv').read_file() as csv_h:
 for rows in csv_h:
 pass

Another suggestion that I can give you is to know what you are importing and import just what you need. So, import os should be replaced with from os import path and remove os. from if os.path.isfile(self.file_path): and if os.path.exists(self.file_path):.

Following this, you will understand your code better, and you will be able to ask more specific questions about it.

Kate Kate 8,0789 silver badges21 bronze badges · Answer 2 · 2020-03-08 11:57:48Z

In a fully-fledged class, DELIMIT and FIELDS could have been defined as parameters rather than left hardcoded like this:

DELIMIT = ',' # delimiter for csv
FIELDS = set(list(['foo', 'bar'])) # fields that I want

Regarding exception handling, I recommend that you keep the full details including the stack trace - you don't have to display it to the user but it's good to have a log somewhere to investigate errors. Debugging can be difficult if you have to rely only on what your program outputs. For example, if an except IOError is raised it will not be immediately apparent what happened: was it a permission issue or the file does not exist or something else ?

Using the logger module you can output messages to multiple destinations at the same time, for example show a simplified message on the console but log more details to a file, also the logging levels can be different for each destination.

In this function os.path.exists is redundant, isfile will suffice:

def _is_file_ok(self):
 if os.path.isfile(self.file_path):
 if os.path.exists(self.file_path):
 if self.file_path.endswith('.csv'):
 return True
 return False

From the doc:

os.path.isfile(path)

Return True if path is an existing regular file. This follows symbolic links, so both islink() and isfile() can be true for the same path.

isfile and endswith are boolean functions, so you can directly return the result of the function. In a slightly more concise way:

def _is_file_ok(self):
 if os.path.isfile(self.file_path):
 return self.file_path.endswith('.csv')
 else:
 return False

On a final note, I also agree that Pandas would do the job fine with much less code.

I meant to ask if this approach is right to read .csv file and return an iterable. If an error occurs in any part of the script it should log and return an empty iterable

Stack Exchange Network

Read .csv files and convert the rows into dict for looping

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Read .csv files and convert the rows into dict for looping

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions