Eventstamp based synchronization protocols for OpenRosa / REST APIs #1615

New issue

Open

Feature

Labels

backend performance

@brontolosone

Description

@brontolosone

brontolosone

opened

on Jan 22, 2026

Related: #544

Problem

From #544:

Long-term, the goal is for OpenRosa clients to download changes to the entity list progressively rather than re-download the list each time. Once that's in place, it's not clear that this will still be much of an issue. Maybe clients will re-download the individual resolved entities, but they won't be re-downloading the entire entity list.

Currently, none of our collections (submissions, entities) are properly incrementally syncable!
Pagination based on timestamps or even PKs doesn't work for this. Only the newly introduced eventstamps give you the guarantee that you'll never see newer information denoted with a lower cursor value than the max cursor value you've observed so far. AFAIK it is the only sound candidate cursor primitive we have in the DB, currently.

In effect, that means that any analysis pipeline, for it to not miss mutations, needs to download and diff the whole collection (or at least the index, eg /v1/projects/{projectId}/forms/{xmlFormId}/submissions) every time to determine whether anything's changed.
It becomes especially unwieldy for large oft-updated entity lists that are being downloaded, potentially over very crappy connections, by Collect.

Proposal for a solution

So, a sound (but simple) incremental synchronization protocol based on an eventstamp cursor seems like the most straightforward way to address these problems.

It could work as follows:

Client would query for everything that happened-after the highest event stamp they saw in their previous download. That collection would thus contain both new elements as well as elements which are an updated version (including deletions) of elements received earlier. Example:

first download

Response to GET /some/entities:

eventID elementID payload
3 A W
8 B X
4 C Y

Client denotes max(eventID), 8 in this case, and saves that on its side as the cursor state. (and it saves ("caches") the whole collection too, of course).

second download

This time, the client supplies the cursor state. It could be a request header, but using the URL is less obtuse.
Response to GET /some/entitities?after_event=8:

eventID elementID payload
9 D Z
10 A W'

Client denotes max(eventID), 10 in this case, and saves that on its side as the cursor state.
To incorporate updates, client should add D to its collection, and replace the payload of A (which was W) with W'. Simple!

A new requirement

For this to work we'll have a new requirement: eternal tombstones. Currently pseudo-tombstones (eg for soft-deleted submissions) are kept around for a certain amount of time, but eventually they'll be garbage-collected (hard-deleted). Once that happens, the deletion becomes unpropagatable, and clients that didn't update in the interval between soft-deletion and hard-deletion will never hear of the deletion. The common solution to that is to have proper tombstones that really only carry minimal information. In our case, for this protocol, it would suffice to have (eventID, elementID) tombstones.**

Metadata

Assignees

No one assigned

Status

🕒 backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eventstamp based synchronization protocols for OpenRosa / REST APIs #1615

Description

Problem

Proposal for a solution

first download

second download

A new requirement

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions