I am reading a data_frame directly from a database using pandas.io.sql.read_frame:
cnx = pandas.io.sql.connect(host='srv',user='me',password='pw',database='db')
df = pandas.io.sql.read_frame('sql_query',cnx)
It works nicely in retrieving the data. But I would like to parse one of the columns as a datetime64, akin to what can be done when reading from a CSV file, e.g.:
df2 = pandas.io.read_csv(csv_file, parse_dates=[0])
But there is no parse_dates flag for read_frame. What alternative approach is recommended?
The same question applies to the index_col in read_csv, which indicates which col. should be the index. Is there a recommended way to do this with read_frame?
3 Answers 3
This question is very old by now. pandas 0.10 is very old as well. In the newest version of pandas 0.16, the read_frame method has been depricated in favour of the read_sql. Even so, the documentation says that just like the read_csv function, it takes a parse_dates argument Pandas 0.16 read_frame
It seems the parse_dates argument appeared in 0.14, at the same time as read_frame was depricated. The read_sql function seems to be a rename of the read_frame, so just updating your pandas version to 0.14 or higher and renaming your function will give you access to this argument. Here is the doc for the read_sql function: Pandas 0.16 read_sql
Comments
df = pandas.io.sql.read_frame('sql_query', index=['date_column_name'], con=cnx)
where date_column_name is the name of the column in the database that contains date elements. sql_query should then be of the form select date_column_name, data_column_name from ...
Pandas (as of 0.13+) will then automatically parse it to a date format if it resembles a date string.
In [34]: df.index
Out[34]:
<class 'pandas.tseries.index.DatetimeIndex'>
pandas.io.sqlto pandas, and it is still a work in progress, particularly detection of specific datatypes. I expect an upcoming version will contain big improvements. You can catch up on some recent discussion here: github.com/pydata/pandas/issues/1662 and here: github.com/pydata/pandas/issues/2717pd.tslib.Timestampobjects. And there is anindex_colargument forread_frame. Are you using the latest stable release of pandas?index_col=[0], as I do withpandas.io.read_csv, and it failed: KeyError: u'no item named 0'. After reading your comment, I triedindex_col=[key_name_string]instead, and it worked. Also, as the required column index is a datetime, pandas now correctly identifies the DataFrame as having a DatetimeIndex. So my problem is solved, thank you! However, before I set the col. as index, the DateTime type was not parsed correctly, so aparse_datesargument forpandas.io.sql.read_framewould be great.