-
Notifications
You must be signed in to change notification settings - Fork 214
Description
Related: #544
Problem
From #544:
- Long-term, the goal is for OpenRosa clients to download changes to the entity list progressively rather than re-download the list each time. Once that's in place, it's not clear that this will still be much of an issue. Maybe clients will re-download the individual resolved entities, but they won't be re-downloading the entire entity list.
Currently, none of our collections (submissions, entities) are properly incrementally syncable!
Pagination based on timestamps or even PKs doesn't work for this. Only the newly introduced eventstamps give you the guarantee that you'll never see newer information denoted with a lower cursor value than the max cursor value you've observed so far. AFAIK it is the only sound candidate cursor primitive we have in the DB, currently.
In effect, that means that any analysis pipeline, for it to not miss mutations, needs to download and diff the whole collection (or at least the index, eg /v1/projects/{projectId}/forms/{xmlFormId}/submissions) every time to determine whether anything's changed.
It becomes especially unwieldy for large oft-updated entity lists that are being downloaded, potentially over very crappy connections, by Collect.
Proposal for a solution
So, a sound (but simple) incremental synchronization protocol based on an eventstamp cursor seems like the most straightforward way to address these problems.
It could work as follows:
Client would query for everything that happened-after the highest event stamp they saw in their previous download. That collection would thus contain both new elements as well as elements which are an updated version (including deletions) of elements received earlier. Example:
first download
Response to GET /some/entities:
eventID elementID payload
3 A W
8 B X
4 C Y
- Client denotes
max(eventID), 8 in this case, and saves that on its side as the cursor state. (and it saves ("caches") the whole collection too, of course).
second download
This time, the client supplies the cursor state. It could be a request header, but using the URL is less obtuse.
Response to GET /some/entitities?after_event=8:
eventID elementID payload
9 D Z
10 A W'
- Client denotes
max(eventID), 10 in this case, and saves that on its side as the cursor state. - To incorporate updates, client should add
Dto its collection, and replace the payload ofA(which wasW) withW'. Simple!
A new requirement
For this to work we'll have a new requirement: eternal tombstones. Currently pseudo-tombstones (eg for soft-deleted submissions) are kept around for a certain amount of time, but eventually they'll be garbage-collected (hard-deleted). Once that happens, the deletion becomes unpropagatable, and clients that didn't update in the interval between soft-deletion and hard-deletion will never hear of the deletion. The common solution to that is to have proper tombstones that really only carry minimal information. In our case, for this protocol, it would suffice to have (eventID, elementID) tombstones.**
Metadata
Metadata
Assignees
Type
Projects
Status