SQLite and Python: commit once every 10 seconds maximum, and not after every client request

Question 1

In a Python Bottle server using SQLite, I noticed that doing a DB commit after each INSERT is not efficient: it can use 100ms after each client request. Thus I wanted to improve this in How to commit DB changes in a Flask or Bottle app efficiently? .

I finally came to this solution, which is more or less a "debounce"-like method: if multiple SQL INSERT happen during a 10-second timeframe, group all of them in a single DB commit.

What do you think of the following code? Is it safe to do like this?

(Note: I know that the use of a global variable should be avoided, and replaced by a class / object with attributes, etc. I'll do this, but this part is not really on topic here).

import bottle, sqlite3, random, threading, time
@bottle.route('/')
def index():
 global committhread
 c = db.cursor()
 c.execute('INSERT INTO test VALUES (?)', (random.randint(0, 10000),))
 c.close()
 if not committhread: 
 print('Calling commit()...')
 committhread = threading.Thread(target=commit)
 committhread.start()
 else:
 print('A commit is already planned.')
 return 'hello'
def commit():
 global committhread
 print("We'll commit in 10 seconds.")
 time.sleep(10) # I hope this doesn't block/waste CPU here?
 db.commit()
 print('Committed.')
 committhread = None 
db = sqlite3.connect('test.db', check_same_thread=False)
db.execute("CREATE TABLE IF NOT EXISTS test (a int)")
committhread = None
bottle.run(port=80)

If you run this code, opening http://localhost/ once will plan a commit 10 seconds later. If you reopen http://localhost/ multiple times, less than 10 seconds later, you will see that it will be grouped in the same commit, as desired.

Note: this method is not "Do it every 10 seconds" (this would be a classic timer), but rather: "Do it 10 seconds later; if another INSERT comes in the meantime, do all of them together". If there is no INSERT during 60 minutes, with my method it won't do anything at all. With a timer it would still periodically call a function (and notice there's nothing to do).

Worth reading too: How to improve SQLite insert performance in Python 3.6?

Question 2

Your link says commits may be slow because they go to disk. Is your disk really the bottleneck in your program? Disks are a lot slower than CPUs, but I imagine for most servers they can keep up with connections especially if you are not writing a lot of data

Question 3

It's probably both CPU and disk. Since losing 100ms after each request is not an option (I spent days optimizing the requests/page generation to be fast, less than 30ms, so I don't want just a logging in the DB to ruin all these efforts :)), I'm looking for a solution to do a commit only every few INSERTs.

Question 4

Have you measured it taking 100ms? Even in abysmal conditions, I would expect sqlite to rarely hit 5ms using a disk from the last 5 ish year (doing small inserts)

Question 5

@sudorm-rfslash I analyzed timing for many lines of my code, and I measured numbers like 79ms 60ms 84ms for the db.commit() part (low-end server) so the order of magnitude is 100ms.

Question 6

That's surprising to me but I guess if you measured it then I'm wrong

Question 7

What do you think of the following code? Is it safe to do like this?

In my opinion, it is not. So many things could go wrong.

One example: in this code there is no exception handling. If your program crashes for any reason, your routine may not be triggered. Fix: add an exception handler that does some cleanup, commits and closes the DB. Or better yet, just do that commit in the finally block.

By the way the doc says this about the close function:

This closes the database connection. Note that this does not automatically call commit(). If you just close your database connection without calling commit() first, your changes will be lost!

So it is a good idea to commit systematically.

time.sleep(10) # I hope this doesn't block/waste CPU here?

time.sleep blocks the calling thread. It is useless anyway because what you want here is a timer, not a thread. You can have a timer routine that runs every n seconds to perform a given task. But you should still have a commit in the finally block, so that all pending changes are written to the DB when the program ends, even after an exception.

Now to discuss the functionality more in depth:
You say:

In a Python Bottle server using SQLite, I noticed that doing a DB commit after each INSERT is not efficient: it can use 100ms after each client request.

That may not be 'efficient' but if I have to choose between slow and safe there is no hesitation. Have you actually measured how long it takes on your own environment ?

On Stack Overflow you wrote:

I optimized my server to serve pages very fast (10ms), and it would be a shame to lose 100ms because of DB commit.

While I applaud your obsession with performance, does 100 ms really make a difference to your users ? It normally takes more than 100 ms to load a page or even refresh a portion of it using Ajax or a websocket. The latency resides in the network transport. I don't know how your application is structured but my priority would be to deliver as little traffic as possible to the users. Websocket + client-side JS should do.

Perhaps using a different storage medium could improve IO performance. If you are not using a SSD drive, maybe you could consider it or at least test it.

Before writing code like this I would really try to exhaust all possibilities, but it is better (more reliable) to let SQLite handle things using the options that already exist. What have you tried so far ?

Would this be acceptable to you ?

PRAGMA schema.synchronous = OFF

With synchronous OFF (0), SQLite continues without syncing as soon as it has handed data off to the operating system. If the application running SQLite crashes, the data will be safe, but the database might become corrupted if the operating system crashes or the computer loses power before that data has been written to the disk surface. On the other hand, commits can be orders of magnitude faster with synchronous OFF.

Source: PRAGMA Statements

There is a risk of corruption in case of power loss but this is no worse than what you are doing. If on the other hand data integrity is more important you should stick to a full commit after every operation.

You should also have a look at Write-Ahead Logging. This may interest you if there are concurrent writes to your database. Otherwise opening the DB in EXCLUSIVE mode may bring some benefits (see the PRAGMA page for details).

More detailed discussions:

Last but not least: transactions. SQLite starts an implicit transaction automatically every time you run a SQL statement and commits it after execution. You could initiate the BEGIN & COMMIT TRANSACTION statements yourself. So if you have a number or related writes, regroup them under one single transaction. Thus you do one commit for the whole transaction instead of one transaction per statement (there is more consistency too: in case an error occurs in the middle of the process you won't be left with orphaned records).

There are quite many things you can try until you find the mix that is right for you.

Question 8

Thank you very much for your detailed answer. You're right: there are many other things to try before doing my do it in 10 seconds, and group all further INSERT in the same commit method. I just tried your PRAGMA suggestion, and this c.execute('PRAGMA journal_mode = OFF') gives a massive improvement. c.execute('PRAGMA synchronous = OFF'); makes it asynchronous so when we measure it's 0 ;) So it's probably done "when the current process has some spare time" and it's perfect for my needs.

Question 9

A little remark about time.sleep blocks the calling thread. It is useless anyway because what you want here is a timer, not a thread. You can have a timer routine that runs every n seconds to perform a given task. My method is slightly different though: it's not "do it every 10 seconds" (this would be a classic timer), but rather: "do it 10 sec later; if another INSERT comes in the meantime, do all of them together". If there is no INSERT during 60 minutes, with my method it won't do anything at all. With a timer it would still periodically call a function (and notice there's nothing to do).

Question 10

Last thing: with synchronous = OFF, will the OS (Linux) write to disk as soon it has time to do it, or are there cases in which the OS will wait several minutes before doing it?

Question 11

Looks like PRAGMA synchronous = OFF works, but for those wondering about PRAGMA <schema>.synchronous = OFF, the default schema in main, as in PRAGMA main.synchronous = OFF

Kate Kate 8,0889 silver badges21 bronze badges · Accepted Answer · 2020-05-05 17:00:24Z

What do you think of the following code? Is it safe to do like this?

In my opinion, it is not. So many things could go wrong.

One example: in this code there is no exception handling. If your program crashes for any reason, your routine may not be triggered. Fix: add an exception handler that does some cleanup, commits and closes the DB. Or better yet, just do that commit in the finally block.

By the way the doc says this about the close function:

This closes the database connection. Note that this does not automatically call commit(). If you just close your database connection without calling commit() first, your changes will be lost!

So it is a good idea to commit systematically.

time.sleep(10) # I hope this doesn't block/waste CPU here?

time.sleep blocks the calling thread. It is useless anyway because what you want here is a timer, not a thread. You can have a timer routine that runs every n seconds to perform a given task. But you should still have a commit in the finally block, so that all pending changes are written to the DB when the program ends, even after an exception.

Now to discuss the functionality more in depth:
You say:

In a Python Bottle server using SQLite, I noticed that doing a DB commit after each INSERT is not efficient: it can use 100ms after each client request.

That may not be 'efficient' but if I have to choose between slow and safe there is no hesitation. Have you actually measured how long it takes on your own environment ?

On Stack Overflow you wrote:

I optimized my server to serve pages very fast (10ms), and it would be a shame to lose 100ms because of DB commit.

While I applaud your obsession with performance, does 100 ms really make a difference to your users ? It normally takes more than 100 ms to load a page or even refresh a portion of it using Ajax or a websocket. The latency resides in the network transport. I don't know how your application is structured but my priority would be to deliver as little traffic as possible to the users. Websocket + client-side JS should do.

Perhaps using a different storage medium could improve IO performance. If you are not using a SSD drive, maybe you could consider it or at least test it.

Before writing code like this I would really try to exhaust all possibilities, but it is better (more reliable) to let SQLite handle things using the options that already exist. What have you tried so far ?

Would this be acceptable to you ?

PRAGMA schema.synchronous = OFF

With synchronous OFF (0), SQLite continues without syncing as soon as it has handed data off to the operating system. If the application running SQLite crashes, the data will be safe, but the database might become corrupted if the operating system crashes or the computer loses power before that data has been written to the disk surface. On the other hand, commits can be orders of magnitude faster with synchronous OFF.

Source: PRAGMA Statements

There is a risk of corruption in case of power loss but this is no worse than what you are doing. If on the other hand data integrity is more important you should stick to a full commit after every operation.

You should also have a look at Write-Ahead Logging. This may interest you if there are concurrent writes to your database. Otherwise opening the DB in EXCLUSIVE mode may bring some benefits (see the PRAGMA page for details).

More detailed discussions:

Last but not least: transactions. SQLite starts an implicit transaction automatically every time you run a SQL statement and commits it after execution. You could initiate the BEGIN & COMMIT TRANSACTION statements yourself. So if you have a number or related writes, regroup them under one single transaction. Thus you do one commit for the whole transaction instead of one transaction per statement (there is more consistency too: in case an error occurs in the middle of the process you won't be left with orphaned records).

There are quite many things you can try until you find the mix that is right for you.

Thank you very much for your detailed answer. You're right: there are many other things to try before doing my do it in 10 seconds, and group all further INSERT in the same commit method. I just tried your PRAGMA suggestion, and this c.execute('PRAGMA journal_mode = OFF') gives a massive improvement. c.execute('PRAGMA synchronous = OFF'); makes it asynchronous so when we measure it's 0 ;) So it's probably done "when the current process has some spare time" and it's perfect for my needs.
A little remark about time.sleep blocks the calling thread. It is useless anyway because what you want here is a timer, not a thread. You can have a timer routine that runs every n seconds to perform a given task. My method is slightly different though: it's not "do it every 10 seconds" (this would be a classic timer), but rather: "do it 10 sec later; if another INSERT comes in the meantime, do all of them together". If there is no INSERT during 60 minutes, with my method it won't do anything at all. With a timer it would still periodically call a function (and notice there's nothing to do).
Last thing: with synchronous = OFF, will the OS (Linux) write to disk as soon it has time to do it, or are there cases in which the OS will wait several minutes before doing it?
Looks like PRAGMA synchronous = OFF works, but for those wondering about PRAGMA <schema>.synchronous = OFF, the default schema in main, as in PRAGMA main.synchronous = OFF

Stack Exchange Network

SQLite and Python: commit once every 10 seconds maximum, and not after every client request

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

SQLite and Python: commit once every 10 seconds maximum, and not after every client request

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions