52

I have a list of stockmarket data pulled from Yahoo in a pandas DataFrame (see format below). The date is serving as the index in the DataFrame. I want to write the data (including the index) out to a SQLite database.

 AAPL GE
Date
2009年01月02日 89.95 14.76
2009年01月05日 93.75 14.38
2009年01月06日 92.20 14.58
2009年01月07日 90.21 13.93
2009年01月08日 91.88 13.95

Based on my reading of the write_frame code for Pandas, it does not currently support writing the index. I've attempted to use to_records instead, but ran into the issue with Numpy 1.6.2 and datetimes. Now I'm trying to write tuples using .itertuples, but SQLite throws an error that the data type isn't supported (see code and result below). I'm relatively new to Python, Pandas and Numpy, so it is entirely possible I'm missing something obvious. I think I'm running into a problem trying to write a datetime to SQLite, but I think I might be overcomplicating this.

I think I may be able to fix the issue by upgrading to Numpy 1.7 or the development version of Pandas, which has a fix posted on GitHub. I'd prefer to develop using release versions of software - I'm new to this and I don't want stability issues confusing matters further.

Is there a way to accomplish this using Python 2.7.2, Pandas 0.10.0, and Numpy 1.6.2? Perhaps cleaning the datetimes somehow? I'm in a bit over my head, any help would be appreciated.

Code:

import numpy as np
import pandas as pd
from pandas import DataFrame, Series
import sqlite3 as db
# download data from yahoo
all_data = {}
for ticker in ['AAPL', 'GE']:
 all_data[ticker] = pd.io.data.get_data_yahoo(ticker, '1/1/2009','12/31/2012')
# create a data frame
price = DataFrame({tic: data['Adj Close'] for tic, data in all_data.iteritems()})
# get output ready for database export
output = price.itertuples()
data = tuple(output)
# connect to a test DB with one three-column table titled "Demo"
con = db.connect('c:/Python27/test.db')
wildcards = ','.join(['?'] * 3)
insert_sql = 'INSERT INTO Demo VALUES (%s)' % wildcards
con.executemany(insert_sql, data)

Result:

---------------------------------------------------------------------------
InterfaceError Traceback (most recent call last)
<ipython-input-15-680cc9889c56> in <module>()
----> 1 con.executemany(insert_sql, data)
InterfaceError: Error binding parameter 0 - probably unsupported type.
Andy Hayden
378k110 gold badges640 silver badges546 bronze badges
asked Jan 21, 2013 at 2:19
1
  • If you just want the index as a column in the table, can't you just give your DataFrame a column that duplicates the index? Commented Jan 21, 2013 at 2:21

5 Answers 5

72

In recent pandas the index will be saved in the database (you used to have to reset_index first).

Following the docs (setting a SQLite connection in memory):

import sqlite3
# Create your connection.
cnx = sqlite3.connect(':memory:')

Note: You can also pass a SQLAlchemy engine here (see end of answer).

We can save price2 to cnx:

price2.to_sql(name='price2', con=cnx)

We can retrieve via read_sql:

p2 = pd.read_sql('select * from price2', cnx)

However, when stored (and retrieved) dates are unicode rather than Timestamp. To convert back to what we started with we can use pd.to_datetime:

p2.Date = pd.to_datetime(p2.Date)
p = p2.set_index('Date')

We get back the same DataFrame as prices:

In [11]: p2
Out[11]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1006 entries, 2009年01月02日 00:00:00 to 2012年12月31日 00:00:00
Data columns:
AAPL 1006 non-null values
GE 1006 non-null values
dtypes: float64(2)

You can also use a SQLAlchemy engine:

from sqlalchemy import create_engine
e = create_engine('sqlite://') # pass your db url
price2.to_sql(name='price2', con=cnx)

This allows you to use read_sql_table (which can only be used with SQLAlchemy):

pd.read_sql_table(table_name='price2', con=e)
# Date AAPL GE
# 0 2009年01月02日 89.95 14.76
# 1 2009年01月05日 93.75 14.38
# 2 2009年01月06日 92.20 14.58
# 3 2009年01月07日 90.21 13.93
# 4 2009年01月08日 91.88 13.95
answered Jan 21, 2013 at 5:07
Sign up to request clarification or add additional context in comments.

1 Comment

It looks to me that price2.to_sql(name='price2', con=cnx) should instead read price2.to_sql(name='price2', con=e). Since otherwise how would the price2 data get on the on-disk SQLite db corresponding to "pass your db url" in the first place, in order to be retrieved by read_sql_table? At any rate, that works for me. Though I find it puzzling that nobody has visibly commented on this since 2013.
22

Unfortunately, pandas.io.write_frame no longer exists in more recent versions of Pandas in regards to the current accepted answer. For example I'm using pandas 0.19.2. You can do something like

from sqlalchemy import create_engine
disk_engine = create_engine('sqlite:///my_lite_store.db')
price.to_sql('stock_price', disk_engine, if_exists='append')

And then in turn preview your table with the following:

df = pd.read_sql_query('SELECT * FROM stock_price LIMIT 3',disk_engine)
df.head()
cbcoutinho
7141 gold badge14 silver badges28 bronze badges
answered Apr 6, 2017 at 15:56

1 Comment

if_exists='append' helps prevent ValueError: Table 'abc' already exists very useful thanks.
16

Below is the code which worked for me. I was able to write it to SQLite DB.

import pandas as pd
import sqlite3 as sq
data = <This is going to be your pandas dataframe>
sql_data = 'D:\\SA.sqlite' #- Creates DB names SQLite
conn = sq.connect(sql_data)
cur = conn.cursor()
cur.execute('''DROP TABLE IF EXISTS SA''')
data.to_sql('SA', conn, if_exists='replace', index=False) # - writes the pd.df to SQLIte DB
pd.read_sql('select * from SentimentAnalysis', conn)
conn.commit()
conn.close()
answered Dec 7, 2018 at 19:30

Comments

12

Minimal example with sqlite3

Based on Keertesh Kumar's answer.

Writing df to sqlite

import pandas as pd
import sqlite3 as sq
df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
table_name = "test" # table and file name
conn = sq.connect('{}.sqlite'.format(table_name)) # creates file
df.to_sql(table_name, conn, if_exists='replace', index=False) # writes to file
conn.close() # good practice: close connection

Reading sqlite to df

conn = sq.connect('{}.sqlite'.format(table_name))
df = pd.read_sql('select * from {}'.format(table_name), conn)
conn.close()
answered Feb 24, 2022 at 9:26

Comments

2

Pandas read_sql can set the index for you.

Taking the example from Andy Hayden's answer above, instead of:

p2 = pd.read_sql('select * from price2', cnx)
# When stored (and retrieved) dates are unicode rather than Timestamp. 
# Convert back to what we started with we can use pd.to_datetime:
p2.Date = pd.to_datetime(p2.Date)
p2 = p2.set_index('Date')

You can simply do this:

p2 = pd.read_sql('select * from price2', cnx, index_col='Date', parse_dates=['Date'])

This will automatically set the Date column as the index and simultaneously convert it to Timestamps.

answered Oct 4, 2022 at 0:44

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.