Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Eventstamp based synchronization protocols for OpenRosa / REST APIs #1615

Open
Labels
backendRequires a change to the API server performancePerformance, benchmarking
@brontolosone

Description

Related: #544

Problem

From #544:

  • Long-term, the goal is for OpenRosa clients to download changes to the entity list progressively rather than re-download the list each time. Once that's in place, it's not clear that this will still be much of an issue. Maybe clients will re-download the individual resolved entities, but they won't be re-downloading the entire entity list.

Currently, none of our collections (submissions, entities) are properly incrementally syncable!
Pagination based on timestamps or even PKs doesn't work for this. Only the newly introduced eventstamps give you the guarantee that you'll never see newer information denoted with a lower cursor value than the max cursor value you've observed so far. AFAIK it is the only sound candidate cursor primitive we have in the DB, currently.

In effect, that means that any analysis pipeline, for it to not miss mutations, needs to download and diff the whole collection (or at least the index, eg /v1/projects/{projectId}/forms/{xmlFormId}/submissions) every time to determine whether anything's changed.
It becomes especially unwieldy for large oft-updated entity lists that are being downloaded, potentially over very crappy connections, by Collect.

Proposal for a solution

So, a sound (but simple) incremental synchronization protocol based on an eventstamp cursor seems like the most straightforward way to address these problems.

It could work as follows:

Client would query for everything that happened-after the highest event stamp they saw in their previous download. That collection would thus contain both new elements as well as elements which are an updated version (including deletions) of elements received earlier. Example:

first download

Response to GET /some/entities:

eventID elementID payload
3 A W
8 B X
4 C Y
  • Client denotes max(eventID), 8 in this case, and saves that on its side as the cursor state. (and it saves ("caches") the whole collection too, of course).

second download

This time, the client supplies the cursor state. It could be a request header, but using the URL is less obtuse.
Response to GET /some/entitities?after_event=8:

eventID elementID payload
9 D Z
10 A W'
  • Client denotes max(eventID), 10 in this case, and saves that on its side as the cursor state.
  • To incorporate updates, client should add D to its collection, and replace the payload of A (which was W) with W'. Simple!

A new requirement

For this to work we'll have a new requirement: eternal tombstones. Currently pseudo-tombstones (eg for soft-deleted submissions) are kept around for a certain amount of time, but eventually they'll be garbage-collected (hard-deleted). Once that happens, the deletion becomes unpropagatable, and clients that didn't update in the interval between soft-deletion and hard-deletion will never hear of the deletion. The common solution to that is to have proper tombstones that really only carry minimal information. In our case, for this protocol, it would suffice to have (eventID, elementID) tombstones.**

Metadata

Metadata

Assignees

No one assigned

    Labels

    backendRequires a change to the API server performancePerformance, benchmarking

    Type

    Projects

    Status

    🕒 backlog

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /