Python insert numpy array into sqlite3 database

Question 1

I'm trying to store a numpy array of about 1000 floats in a sqlite3 database but I keep getting the error "InterfaceError: Error binding parameter 1 - probably unsupported type".

I was under the impression a BLOB data type could be anything but it definitely doesn't work with a numpy array. Here's what I tried:

import sqlite3 as sql
import numpy as np
con = sql.connect('test.bd',isolation_level=None)
cur = con.cursor()
cur.execute("CREATE TABLE foobar (id INTEGER PRIMARY KEY, array BLOB)")
cur.execute("INSERT INTO foobar VALUES (?,?)", (None,np.arange(0,500,0.5)))
con.commit()

Is there another module I can use to get the numpy array into the table? Or can I convert the numpy array into another form in Python (like a list or string I can split) that sqlite will accept? Performance isn't a priority. I just want it to work!

Thanks!

Question 2

Don't know, but try to convert to list? np.arange(1000).tolist()

Question 3

or probably json.dumps(np.arange(1000).tolist())

Question 4

You could register a new array data type with sqlite3:

import sqlite3
import numpy as np
import io
def adapt_array(arr):
 """
 http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
 """
 out = io.BytesIO()
 np.save(out, arr)
 out.seek(0)
 return sqlite3.Binary(out.read())
def convert_array(text):
 out = io.BytesIO(text)
 out.seek(0)
 return np.load(out)
# Converts np.array to TEXT when inserting
sqlite3.register_adapter(np.ndarray, adapt_array)
# Converts TEXT to np.array when selecting
sqlite3.register_converter("array", convert_array)
x = np.arange(12).reshape(2,6)
con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES)
cur = con.cursor()
cur.execute("create table test (arr array)")

With this setup, you can simply insert the NumPy array with no change in syntax:

cur.execute("insert into test (arr) values (?)", (x, ))

And retrieve the array directly from sqlite as a NumPy array:

cur.execute("select arr from test")
data = cur.fetchone()[0]
print(data)
# [[ 0 1 2 3 4 5]
# [ 6 7 8 9 10 11]]
print(type(data))
# <type 'numpy.ndarray'>

Question 5

This works great for me. Just to make clear for others, the connection must be opened with the option detect_types=sqlite3.PARSE_DECLTYPES I ran in to trouble because I forgot to keep that in.

Question 6

That buffer breaks 3.x compatibility (which is a weird thing to do in code that uses the io module and print as a function), and it is doesn't seem to be necessary in my 2.7.6 or 2.7.9. Maybe older versions of sqlite3 had a problem with it, but if 2.6+ works without it, you should probably remove it. See also this question.

Question 7

Also, if it is a problem with 2.x, does bytearray solve the problem? Because that would be portable to 3.x (without having to do a hack like defining try: buffer except NameError: buffer=bytes or def buffer(x): return x or something).

Question 8

I followed this solution, but can't load the value back to memory variables after I stored them into sqlite3 db.

Question 9

Thanks, I got the root cause of my problem. I didn't set detect_types=sqlite3.PARSE_DECLTYPES when calling sqlite3.connect(), after adding it, it works for my system also. Thanks again!

Question 10

I think that matlab format is a really convenient way to store and retrieve numpy arrays. Is really fast and the disk and memory footprint is quite the same.

Load / Save / Disk Comparison

(image from mverleg benchmarks)

But if for any reason you need to store the numpy arrays into SQLite I suggest to add some compression capabilities.

The extra lines from unutbu code is pretty simple

compressor = 'zlib' # zlib, bz2
def adapt_array(arr):
 """
 http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
 """
 # zlib uses similar disk size that Matlab v5 .mat files
 # bz2 compress 4 times zlib, but storing process is 20 times slower.
 out = io.BytesIO()
 np.save(out, arr)
 out.seek(0)
 return sqlite3.Binary(out.read().encode(compressor)) # zlib, bz2
def convert_array(text):
 out = io.BytesIO(text)
 out.seek(0)
 out = io.BytesIO(out.read().decode(compressor))
 return np.load(out)

The results testing with MNIST database gives were:

$ ./test_MNIST.py
[69900]: 99% remain: 0 secs 
Storing 70000 images in 379.9 secs
Retrieve 6990 images in 9.5 secs
$ ls -lh example.db 
-rw-r--r-- 1 agp agp 69M sep 22 07:27 example.db
$ ls -lh mnist-original.mat 
-rw-r--r-- 1 agp agp 53M sep 20 17:59 mnist-original.mat
```

using zlib, and

$ ./test_MNIST.py
[69900]: 99% remain: 12 secs 
Storing 70000 images in 8536.2 secs
Retrieve 6990 images in 37.4 secs
$ ls -lh example.db 
-rw-r--r-- 1 agp agp 19M sep 22 03:33 example.db
$ ls -lh mnist-original.mat 
-rw-r--r-- 1 agp agp 53M sep 20 17:59 mnist-original.mat

using bz2

Comparing Matlab V5 format with bz2 on SQLite, the bz2 compression is around 2.8, but the access time is quite long comparing to Matlab format (almost instantaneously vs more than 30 secs). Maybe is worthy only for really huge databases where the learning process is much time consuming than access time or where the database footprint is needed to be as small as possible.

Finally note that bipz/zlib ratio is around 3.7 and zlib/matlab requires 30% more space.

The full code if you want to play yourself is:

import sqlite3
import numpy as np
import io
compressor = 'zlib' # zlib, bz2
def adapt_array(arr):
 """
 http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
 """
 # zlib uses similar disk size that Matlab v5 .mat files
 # bz2 compress 4 times zlib, but storing process is 20 times slower.
 out = io.BytesIO()
 np.save(out, arr)
 out.seek(0)
 return sqlite3.Binary(out.read().encode(compressor)) # zlib, bz2
def convert_array(text):
 out = io.BytesIO(text)
 out.seek(0)
 out = io.BytesIO(out.read().decode(compressor))
 return np.load(out)
sqlite3.register_adapter(np.ndarray, adapt_array)
sqlite3.register_converter("array", convert_array)
dbname = 'example.db'
def test_save_sqlite_arrays():
 "Load MNIST database (70000 samples) and store in a compressed SQLite db"
 os.path.exists(dbname) and os.unlink(dbname)
 con = sqlite3.connect(dbname, detect_types=sqlite3.PARSE_DECLTYPES)
 cur = con.cursor()
 cur.execute("create table test (idx integer primary key, X array, y integer );")
 mnist = fetch_mldata('MNIST original')
 X, y = mnist.data, mnist.target
 m = X.shape[0]
 t0 = time.time()
 for i, x in enumerate(X):
 cur.execute("insert into test (idx, X, y) values (?,?,?)",
 (i, y, int(y[i])))
 if not i % 100 and i > 0:
 elapsed = time.time() - t0
 remain = float(m - i) / i * elapsed
 print "\r[%5d]: %3d%% remain: %d secs" % (i, 100 * i / m, remain),
 sys.stdout.flush()
 con.commit()
 con.close()
 elapsed = time.time() - t0
 print
 print "Storing %d images in %0.1f secs" % (m, elapsed)
def test_load_sqlite_arrays():
 "Query MNIST SQLite database and load some samples"
 con = sqlite3.connect(dbname, detect_types=sqlite3.PARSE_DECLTYPES)
 cur = con.cursor()
 # select all images labeled as '2'
 t0 = time.time()
 cur.execute('select idx, X, y from test where y = 2')
 data = cur.fetchall()
 elapsed = time.time() - t0
 print "Retrieve %d images in %0.1f secs" % (len(data), elapsed)
if __name__ == '__main__':
 test_save_sqlite_arrays()
 test_load_sqlite_arrays()

Question 11

I am getting this error, any ideas?: LookupError: 'zlib' is not a text encoding; use codecs.decode() to handle arbitrary codecs

Question 12

You can also use zlib.compress(out.read()).

Question 13

several libraries to put in imports: os, time, sklearn.datasets fetch_openml. The fetch_mldata function is deprecated. The sqlite returns an error : ' ProgrammingError: Error binding parameter 2: type 'Series' is not supported' regarding this line "cur.execute("insert into test (idx, X, y) values (?,?,?)","

Question 14

This works for me:

import sqlite3 as sql
import numpy as np
import json
con = sql.connect('test.db',isolation_level=None)
cur = con.cursor()
cur.execute("DROP TABLE FOOBAR")
cur.execute("CREATE TABLE foobar (id INTEGER PRIMARY KEY, array BLOB)")
cur.execute("INSERT INTO foobar VALUES (?,?)", (None, json.dumps(np.arange(0,500,0.5).tolist())))
con.commit()
cur.execute("SELECT * FROM FOOBAR")
data = cur.fetchall()
print data
data = cur.fetchall()
my_list = json.loads(data[0][1])

Question 15

This could work for small arrays (~ 100k items). But it's highly inefficient to convert a numpy array to a Python list and then to a JSON text when dealing with millions of items or more...

Question 16

Happy Leap Second has it close but I kept getting an automatic casting to string. Also if you check out this other post: a fun debate on using buffer or Binary to push non text data into sqlite you see that the documented approach is to avoid the buffer all together and use this chunk of code.

def adapt_array(arr):
 out = io.BytesIO()
 np.save(out, arr)
 out.seek(0)
 return sqlite3.Binary(out.read())

I haven't heavily tested this in python 3, but it seems to work in python 2.7

Question 17

Thank you for this improvement; this seems to work in both Python3 and 2.7. I've updated my answer to use sqlite3.Binary as well.

Question 18

The other methods specified didn't work for me. And well there seems to be a numpy.tobytes method now and a numpy.fromstring (which works on byte strings) but is deprecated and the recommended method is numpy.frombuffer.

import sqlite3
import numpy as np
sqlite3.register_adapter(np.array, lambda arr: arr.tobytes()) 
sqlite3.register_converter("array", np.frombuffer)

I've tested it in my application and it works well for me on Python 3.7.3 and numpy 1.16.2

numpy.fromstring gives the same outputs along with DeprecationWarning: The binary mode of fromstring is deprecated, as it behaves surprisingly on unicode inputs. Use frombuffer instead

Question 19

I like this but if you have different dtypes, some float and some float32, it's best to standardize: arr.astype('float32').tobytes() and np.frombuffer(text, dtype='float32')

Question 20

This works if you always use 1D arrays. But if you have sometimes 1D arrays, and sometimes 2D arrays, it won't work: after select, you will always get a flattened 1D array. Reason: tobytes() doesn't store the shape.

Question 21

Ready to use code based on @unutbu's answer (cleaned a bit, no need to seek, etc.), and test with a 2D ndarray:

import sqlite3, numpy as np, io
def adapt_array(arr):
 out = io.BytesIO()
 np.save(out, arr)
 return sqlite3.Binary(out.getvalue())
sqlite3.register_adapter(np.ndarray, adapt_array)
sqlite3.register_converter("array", lambda x: np.load(io.BytesIO(x)))
x = np.random.rand(100, 100)
con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES)
con.execute("create table test (arr array)")
con.execute("insert into test (arr) values (?)", (x, ))
for r in con.execute("select arr from test"):
 print(r[0])

You can use this (see @gavin's answer) instead if and only if you only work with 1D arrays:

sqlite3.register_adapter(np.ndarray, lambda arr: arr.tobytes())
sqlite3.register_converter("array", np.frombuffer)

unutbu 887k197 gold badges1.9k silver badges1.7k bronze badges · Accepted Answer · 2013-09-04 19:32:27Z

You could register a new array data type with sqlite3:

import sqlite3
import numpy as np
import io
def adapt_array(arr):
 """
 http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
 """
 out = io.BytesIO()
 np.save(out, arr)
 out.seek(0)
 return sqlite3.Binary(out.read())
def convert_array(text):
 out = io.BytesIO(text)
 out.seek(0)
 return np.load(out)
# Converts np.array to TEXT when inserting
sqlite3.register_adapter(np.ndarray, adapt_array)
# Converts TEXT to np.array when selecting
sqlite3.register_converter("array", convert_array)
x = np.arange(12).reshape(2,6)
con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES)
cur = con.cursor()
cur.execute("create table test (arr array)")

With this setup, you can simply insert the NumPy array with no change in syntax:

cur.execute("insert into test (arr) values (?)", (x, ))

And retrieve the array directly from sqlite as a NumPy array:

cur.execute("select arr from test")
data = cur.fetchone()[0]
print(data)
# [[ 0 1 2 3 4 5]
# [ 6 7 8 9 10 11]]
print(type(data))
# <type 'numpy.ndarray'>

This works great for me. Just to make clear for others, the connection must be opened with the option detect_types=sqlite3.PARSE_DECLTYPES I ran in to trouble because I forgot to keep that in.
That buffer breaks 3.x compatibility (which is a weird thing to do in code that uses the io module and print as a function), and it is doesn't seem to be necessary in my 2.7.6 or 2.7.9. Maybe older versions of sqlite3 had a problem with it, but if 2.6+ works without it, you should probably remove it. See also this question.
Also, if it is a problem with 2.x, does bytearray solve the problem? Because that would be portable to 3.x (without having to do a hack like defining try: buffer except NameError: buffer=bytes or def buffer(x): return x or something).
I followed this solution, but can't load the value back to memory variables after I stored them into sqlite3 db.
Thanks, I got the root cause of my problem. I didn't set detect_types=sqlite3.PARSE_DECLTYPES when calling sqlite3.connect(), after adding it, it works for my system also. Thanks again!

CollectivesTM on Stack Overflow

Python insert numpy array into sqlite3 database

6 Answers 6

5 Comments

3 Comments

1 Comment

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

6 Answers 6

5 Comments

3 Comments

1 Comment

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related