Issue 13299: namedtuple row factory for sqlite3

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/57508

classification

Title:	namedtuple row factory for sqlite3
Type:	enhancement	Stage:	patch review
Components:	Versions:	Python 3.5

process

Dependencies:	Superseder:
Status:	open	Resolution:
Assigned To:	ghaering	Nosy List:	Russell.Sim, YoSTEALTH, dlenski, eric.araujo, erlendaasland, ghaering, intellimath, ncoghlan, petri.lehtinen, rhettinger, serhiy.storchaka
Priority:	normal	Keywords:	patch

Created on 2011年10月31日 00:32 by ncoghlan, last changed 2022年04月11日 14:57 by admin.

Files
File name	Uploaded	Description	Edit
issue_13299.patch	Russell.Sim, 2012年08月20日 03:20	review
issue_13299.1.patch	Russell.Sim, 2012年08月21日 12:16	review
issue_13299.2.patch	Russell.Sim, 2012年08月21日 12:49	review
sqlite_namedtuplerow.patch	serhiy.storchaka, 2015年01月11日 08:05	review

Messages (19)
msg146670 - (view)	Author: Alyssa Coghlan (ncoghlan) * (Python committer)	Date: 2011年10月31日 00:32
Currently, sqlite3 allows rows to be easily returned as ordinary tuples (default) or sqlite3.Row objects (which allow dict-style access). collections.namedtuple provides a much nicer interface than sqlite3.Row for accessing ordered data which uses valid Python identifiers for field names, and can also tolerate field names which are not valid identifiers. It would be convenient if sqlite3 provided a row factory along the lines of the one posted here: http://peter-hoffmann.com/2010/python-sqlite-namedtuple-factory.html (except with smarter caching on the named tuples)
msg146738 - (view)	Author: Raymond Hettinger (rhettinger) * (Python committer)	Date: 2011年10月31日 19:44
+1
msg147474 - (view)	Author: Éric Araujo (eric.araujo) * (Python committer)	Date: 2011年11月12日 11:20
> collections.namedtuple provides a much nicer interface than sqlite3.Row Definitely!
msg168617 - (view)	Author: Russell Sim (Russell.Sim)	Date: 2012年08月20日 03:20
Hi, Here is an implementation using lru_cache to prevent regeneration of the named tuple each time. Cheers, Russell
msg168748 - (view)	Author: Raymond Hettinger (rhettinger) * (Python committer)	Date: 2012年08月21日 07:20
Caching based on the cursor going to be problematic because a single cursor can be used multiple times with different descriptions: c = conn.cursor() c.execute('select symbol from stocks') print c.description c.execute('select price from stocks') print c.description # same cursor, different layout, needs a new named tuple It might make more sense to cache the namedtuple() factory itself: sql_namedtuple = lru_cache(maxsize=20)(namedtuple) Also, the example in the docs is too lengthy and indirect. Cut-out the step for creating an populating the database -- just assume db created in the example at the top of the page: For example:: >>> conn.row_factory = sqlite3.NamedTupleRow >>> c = conn.cursor() >>> for record in c.execute('select * from stocks'): print record Row(date='2006-01-05', trans='BUY', symbol='RHAT', qty=100.0, price=35.14) Row(date='2006-01-05', trans='BUY', symbol='RHAT', qty=100, price=35.14) Row(date='2006-03-28', trans='BUY', symbol='IBM', qty=1000, price-45.0) No need to go into a further lesson on how to use named tuples. Also, the patch uses star-unpacking: _namedtuple_row(cursor)(*row) Instead, it should use _make: _namedtuple_row(cursor)._make(row) (u'2006-04-05', u'BUY', u'MSFT', 1000, 72.0)
msg168761 - (view)	Author: Russell Sim (Russell.Sim)	Date: 2012年08月21日 12:16
Raymond, Thanks for the comprehensive feedback! It's fantastic! I have updated the patch with most of you feedback... but there was one part that I couldn't follow entirely. I am now using the _make method but I have had to use star unpacking to allow the method to be cached, lru_cache won't allow a key to be a list because they aren't hash-able.
msg168762 - (view)	Author: Alyssa Coghlan (ncoghlan) * (Python committer)	Date: 2012年08月21日 12:35
You should be able to just use "tuple(col[0] for col in cursor.description)" instead of the current list comprehension in order to make the argument hashable.
msg168763 - (view)	Author: Russell Sim (Russell.Sim)	Date: 2012年08月21日 12:49
Nick, Thanks for the tip. I have removed the star unpacking.
msg218657 - (view)	Author: Glenn Langford (glangford) *	Date: 2014年05月16日 12:32
In abstract, I like the namedtuple interface for sqlite3 as well. One caution is that the approach suggested at http://peter-hoffmann.com/2010/python-sqlite-namedtuple-factory.html can have a dramatic impact on performance. For one DB-intensive application, I experienced 20+ seconds run time with the row factory (under 3.4), versus sub second without (identified with cProfile). Many thousands of calls to namedtuple_factory were not good. :)
msg218668 - (view)	Author: Glenn Langford (glangford) *	Date: 2014年05月16日 15:19
...if I understand the proposed caching scheme, then repeated executions of the query SELECT a,b,c FROM table would result in cache hits, since the column names remain the same. I'm guessing this would resolve the performance problem in the app I saw, but it would be good to verify that performance is broadly similar with/without named tuples.
msg223704 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2014年07月22日 22:11
I'd like to see this in 3.5 as I often use sqlite so what needs doing here?
msg223725 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2014年07月23日 05:51
There is significant overhead. Microbenchmark results: $ ./python -m timeit -s "import sqlite3; con = sqlite3.connect(':memory:')" "con.execute('select 1 as a, 2 as b').fetchall()" 10000 loops, best of 3: 35.8 usec per loop $ ./python -m timeit -s "import sqlite3; con = sqlite3.connect(':memory:'); con.row_factory = sqlite3.Row" "con.execute('select 1 as a, 2 as b').fetchall()" 10000 loops, best of 3: 37.3 usec per loop $ ./python -m timeit -s "import sqlite3; con = sqlite3.connect(':memory:'); con.row_factory = sqlite3.NamedTupleRow" "con.execute('select 1 as a, 2 as b').fetchall()" 10000 loops, best of 3: 92.1 usec per loop It would be easier to add __getattr__ to sqlite3.Row.
msg224702 - (view)	Author: Daniel Lenski (dlenski) *	Date: 2014年08月04日 09:28
Serhiy, 52 usec/loop doesn't seem like much overhead. This is not 52 usec per row fetched, but just 52 usec per cursor.execute(). An example where >1 row is fetched for each cursor would show this more clearly. The advantage of namedtuple is that it's a very well-known interface to most Python programmers. Other db-api modules have taken a similar approach; psycopg2 has a dict-like cursor similar to Row, but has added NameTupleCursor in recent versions. (http://initd.org/psycopg/docs/extras.html#namedtuple-cursor)
msg224707 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2014年08月04日 10:37
Yes, above microbenchmarks measure the time of execute() + the time of fetching one row. Here is more precise microbenchmarks. $ ./python -m timeit -s "import sqlite3; con = sqlite3.connect(':memory:'); con.execute('create table t (a, b)')" -s "for i in range(100): con.execute('insert into t values (1, 2)')" -- "con.execute('select * from t').fetchall()" 1000 loops, best of 3: 624 usec per loop $ ./python -m timeit -s "import sqlite3; con = sqlite3.connect(':memory:'); con.row_factory = sqlite3.Row; con.execute('create table t (a, b)')" -s "for i in range(100): con.execute('insert into t values (1, 2)')" -- "con.execute('select * from t').fetchall()" 1000 loops, best of 3: 915 usec per loop $ ./python -m timeit -s "import sqlite3; con = sqlite3.connect(':memory:'); con.row_factory = sqlite3.NamedTupleRow; con.execute('create table t (a, b)')" -s "for i in range(100): con.execute('insert into t values (1, 2)')" -- "con.execute('select * from t').fetchall()" 100 loops, best of 3: 6.21 msec per loop Here sqlite3.Row is about 1.5 times slower than tuple, but sqlite3.NamedTupleRow is about 7 times slower than sqlite3.Row. With C implementation of lru_cache() (issue14373) the result is much better: 100 loops, best of 3: 3.16 msec per loop And it will be even more faster (up to 1.7x) when add to the Cursor class a method which returns a tuple of field names.
msg233839 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2015年01月11日 08:05
Here is faster implementation. $ ./python -m timeit -s "import sqlite3; con = sqlite3.connect(':memory:'); con.row_factory = sqlite3.NamedTupleRow; con.execute('create table t (a, b)')" -s "for i in range(100): con.execute('insert into t values (1, 2)')" -- "con.execute('select * from t').fetchall()" 100 loops, best of 3: 2.74 msec per loop But it is still 3 times slower than sqlite3.Row.
msg234424 - (view)	Author: (YoSTEALTH) *	Date: 2015年01月21日 04:43
note: sqlite_namedtuplerow.patch _cache method conflicts with attached database with say common table.column name like "id" Using namedtuple method over sqlite3.Row was a terrible idea for me. I thought namedtuple is like tuple so should be faster then dict! wrong. I wasted 2 days change my work to namedtuple and back to sqlite3.Row, the speed difference on my working project was: namedtuple 0.035s/result sqlite3.Rows 0.0019s/result for(speed test) range: 10000 namedtuple 17.3s sqlite3.Rows 0.4s My solution was to use sqlite3.Row (for speed) but to get named like usage by convert dict keys() with setattr names: class dict2named(dict): def __init__(self, args, kwargs): super(dict2named, self).__init__(args, *kwargs) self.__dict__ = self Usage: for i in con.execute('SELECT FROM table'): yield dict2named(i) Now i can use: print(i.title) and handy dict methods for dash column names: print(i['my-title']) print(i.get('my-title', 'boo')) Now working project speed: sqlite3.Rows 0.0020s/result for(speed test) range: 10000 sqlite3.Rows 0.8s with dict2named converting This i can work with, tiny compromise in speed with better usage.
msg394912 - (view)	Author: Erlend E. Aasland (erlendaasland) * (Python triager)	Date: 2021年06月02日 14:18
See also bpo-39170
msg395065 - (view)	Author: Raymond Hettinger (rhettinger) * (Python committer)	Date: 2021年06月04日 04:33
FWIW, namedtuple speed improved considerably since these posts were made. When I last checked, their lookup speed was about the same as a dict lookup. See: https://docs.python.org/3/whatsnew/3.9.html#optimizations
msg398970 - (view)	Author: Zaur Shibzukhov (intellimath)	Date: 2021年08月05日 07:15
Instead of using cache, maybe better to use mutable default argument? For example: def make_row_factory(cls_factory, kw): def row_factory(cursor, row, cls=[None]): rf = cls[0] if rf is None: fields = [col[0] for col in cursor.description] cls[0] = cls_factory("Row", fields, kw) return cls[0](row) return rf(row) return row_factory namedtuple_row_factory = make_row_factory(namedtuple) Seem it should add less overhead.

History
Date	User	Action	Args
2022年04月11日 14:57:23	admin	set	github: 57508
2021年08月05日 07:15:53	intellimath	set	nosy: + intellimath messages: + msg398970
2021年06月04日 04:33:57	rhettinger	set	messages: + msg395065
2021年06月02日 14:18:26	erlendaasland	set	messages: + msg394912
2020年05月25日 12:20:47	erlendaasland	set	nosy: + erlendaasland
2019年03月15日 23:10:05	BreamoreBoy	set	nosy: - BreamoreBoy
2015年01月21日 04:43:49	YoSTEALTH	set	nosy: + YoSTEALTH messages: + msg234424
2015年01月11日 08:05:11	serhiy.storchaka	set	files: + sqlite_namedtuplerow.patch messages: + msg233839
2015年01月11日 01:58:33	ghaering	set	assignee: ghaering
2014年08月04日 10:37:41	serhiy.storchaka	set	messages: + msg224707
2014年08月04日 09:28:05	dlenski	set	nosy: + dlenski messages: + msg224702
2014年07月23日 05:51:21	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg223725
2014年07月22日 22:11:39	BreamoreBoy	set	nosy: + BreamoreBoy messages: + msg223704 versions: + Python 3.5, - Python 3.4
2014年07月18日 16:45:09	glangford	set	nosy: - glangford
2014年05月16日 15:19:15	glangford	set	messages: + msg218668
2014年05月16日 12:32:42	glangford	set	nosy: + glangford messages: + msg218657
2012年10月02日 05:27:19	ezio.melotti	set	stage: needs patch -> patch review versions: + Python 3.4, - Python 3.3
2012年08月21日 12:49:55	Russell.Sim	set	files: + issue_13299.2.patch messages: + msg168763
2012年08月21日 12:35:55	ncoghlan	set	messages: + msg168762
2012年08月21日 12:16:26	Russell.Sim	set	files: + issue_13299.1.patch messages: + msg168761
2012年08月21日 07:20:33	rhettinger	set	messages: + msg168748
2012年08月20日 03:20:32	Russell.Sim	set	files: + issue_13299.patch nosy: + Russell.Sim messages: + msg168617 keywords: + patch
2012年02月03日 09:19:39	petri.lehtinen	set	nosy: + petri.lehtinen
2011年11月12日 11:20:05	eric.araujo	set	nosy: + eric.araujo messages: + msg147474
2011年10月31日 19:44:37	rhettinger	set	nosy: + rhettinger messages: + msg146738
2011年10月31日 05:12:23	ned.deily	set	nosy: + ghaering
2011年10月31日 00:32:25	ncoghlan	create

homepage