My code is acquiring data from a sensor and it's performing some operations based on the last N-minutes of data.
At the moment I just initialize a list at the beginning of my code as
x = []
while running:
new_value = acquire_new_point()
x.append(new_value)
do_something_with_x(x)
Despite working, this has some intrinsic flaws that make it sub-optimal. The most important are:
- If the code crashes or restart, the whole time-history is reset
- There is no record or log of the past time-history
- The memory consumption could grow out of control and exceed the available memory
Some obvious solutions exist as:
- log each new element to a csv file and read it when the code starts
- divide the data in N-minutes chunks and drop from the memory chunks that are more than N-minutes old
I have the feeling, however, that this could be a problem for which something more specific solution has been already created. My first thought went to HDF5, but I'm not sure it's the best candidate for this problem.
So, what is the best strategy/format for a database that needs to be written (appended) run-time and needs to be accessed partially (only the last N-minutes)? Is there any tool that is specific for a task like this one?
1 Answer 1
I'd simply just use SQLite with a two-column table (timestamp, value) with an index on the timestamp.
This has the additional bonus that you can use SQL for data analysis, which will likely be faster than doing it by hand in Python.
If you don't need the historical data, you can periodically DELETE FROM data WHERE TIMESTAMP < ....