6
\$\begingroup\$

I am trying to read a file using pandas and then process it. For opening the file I use the following function:

import os
import pandas as pd
def read_base_file(data_folder, base_file):
 files = map(lambda x: os.path.join(data_folder, x), os.listdir(data_folder))
 if base_file in files:
 try:
 df = pd.read_csv(base_file, na_values=["", " ", "-"])
 except Exception, e:
 print "Error in reading", base_file
 print e
 df = pd.DataFrame()
 else:
 print "File Not Found."
 df = pd.DataFrame()
 return df

My main concerns are the if statement and what I should return if there is an error.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Jul 22, 2015 at 18:22
\$\endgroup\$

2 Answers 2

4
\$\begingroup\$

Generator expression

I advice using a generator expression instead of map:

map(lambda x: os.path.join(data_folder, x), os.listdir(data_folder))

should become:

(os.path.join(data_folder, x) for x in os.listdir(data_folder))

Also x should be renamed to something more expressive.

Separation of concerns

You both print and return values, if the printing is for debugging purposes, use logger.log

Specific Exception

If you write:

except Exception, e:

any Exception will be caught, I suggest IOException.

answered Jul 22, 2015 at 19:02
\$\endgroup\$
2
  • \$\begingroup\$ Should I let try/except handle if the file exists or not ? \$\endgroup\$ Commented Jul 22, 2015 at 19:43
  • \$\begingroup\$ @evil_inside sure, just remove the printing. IOError will catch file not existing or unreadable. \$\endgroup\$ Commented Jul 22, 2015 at 19:46
1
\$\begingroup\$

Ideally you should ask forgiveness, not permission.

the check if base_file is in datafolders is not helping. If the file is not in data folder then you get an error trying to return df before defining it. If you mean to check if the file is in that folder and not in another you can do this with an assertion. In this case you are simply asserting that the data folder's path is included in the file_name (you're not stiching folder+file together anywhere...) so you can achieve with a check like: assert 'abc' in 'abcde' That will ensure your base_file isn't coming from the wrong folder.

The assignment of df = pd.DataFrame() is also redundant, since you don't do anything with the df object before returning it, and you seem to default to returning an empty dataframe. Something like this could do the trick:

import pandas as pd
def read_base_file(data_folder, base_file):
 assert data_folder in base_file 
 try:
 return pd.read_csv(base_file, na_values=["", " ", "-"])
 except NameError:
 print "File Not Found.", 
 except Exception, e:
 print "Error in reading", base_file
 print e
 return pd.DataFrame()
answered Nov 7, 2016 at 1:54
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.