How to handle "Optimistic Locking" on a collection with ETag headers?

Question 1

Consider endpoint /projects that returns a list of projects with the following headers:

HTTP/1.1 200 OK
Etag: "superEtag"

The etag value represents a hash of the entire collection and it does not allow a client to update a single resource e.g. /projects/1.

Fetching the resources individually makes no sense, so how can I handle optimistic locking with a collection?

Question 2

Possibly related: stackoverflow.com/questions/28518623/etags-and-collections

Question 3

Yes, I came across this question/answers before posting and none of the recommendations satisfied me. Also, I though "softwareengineering" was a better place for posting this.

Question 4

I'm not clear what it is being asked. How would it be different from locking non-collection?

Question 5

@imel96 When doing GET /projects, the ETag corresponds to the hash of the collection. Now if I want to do PUT /projects/1, I need the hash of this specific resource for the conditional request (If-Match) to be successful. I could do GET /projects/{id} to get the individual hash for each resource, but it makes no sense; the collection service would become useless. I hope this is clearer.

Question 6

When doing GET /projects, the ETag corresponds to the hash of the collection. Now if I want to do PUT /projects/1, I need the hash of this specific resource for the conditional request (If-Match) to be successful. I could do GET /projects/{id} to get the individual hash for each resource, but it makes no sense; the collection service would become useless.

I think the problem is that HTTP doesn't mean what you want it to mean.

Fundamentally, the semantics of HTTP are that the resources are stored in a flat key value store. Although /collection and /collection/item are hierarchical identifiers (we can use relative resolution to get from one to the other), the resources that they identify are not hierarchical. There's no relationship inferred from the similar spelling of the identifiers.

This is why DELETE /collection doesn't do anything to your locally cached copy of /collection/item.

Because there is no inferred relationship between the collection and the item, there is no generic vector available for communicating the eTag of the item(s) in the meta-data for the collection.

You can certainly do either of

GET /collection
Conditional PUT /collection
GET /collection/item
Conditional PUT /collection/item

and the origin server can, at its discretion also change the representation of the other resource, as a side effect.

This isn't to say that you can't communicate the information by hand - there's nothing against the rules about returning a representation of the collection that communicates the appropriate representations of the member items, along with their validators, so that a "smart" client can create the correct requests without needing to get the individual items.

What do you mean by "the resources that they identify are not hierarchical"?

Disclaimer: all analogies are non-normative; what's real is what's described in the specifications.

The semantics of HTTP resources are not quite like those of a file system. For example, if we issue the following command on linux

rm -rf /collection

then one of the effects that we would expect is the removal of /collection/item. But that's not true of HTTP!

DELETE /collection

doesn't say anything at all about the resource /collection/item. It might be that when the server processes this request, the side effects might affect other resources. But HTTP isn't describing implementations, it is only assigning meaning to the messages. The meaning of the request message is constrained by the target resource only.

Another way of saying the same thing: as far as HTTP is concerned, none of these identifiers is "wrong" for an item in a collection.

/collection/item
/item/collection
/f5add126-65ef-4122-8657-03e672f159c4

Some of the server frameworks we use to implement our servers care; for instance, Rails has opinions on spelling. But those are really just implementation details behind the uniform interface.

So yes, in your domain model the project entities and the tracks entities may form a hierarchy, and you might choose spellings for the resource identifiers that reflect that hierarchy, but the semantics of HTTP are those of a flat key value store.

# Example #1: hierarchical key value store
echo ; cat <<EOF | python
d={}
d["/collection"]={}
d["/collection"]["/item"]=456
d.pop("/collection")
print(d)
EOF
{}
# Example #2: flat key value store
echo ; cat <<EOF | python
d={}
d["/collection"]={}
d["/collection/item"]=456
d.pop("/collection")
print(d)
EOF
{'/collection/item': 456}

HTTP acts like the second example.

Question 7

I think I don't have enough knowledge to understand your answer. There are multiple things I don't understand, but let's start with one. What do you mean by "the resources that they identify are not hierarchical"? In this url, are the project resource and tracks resource hierarchical "/projects/1/tracks/1"? In the application and in the database the track and project resources are hierarchical. Or are you simply saying that the relationship between collection/item is not hierarchical (e.g. track 1 is hierarchical to project 1, but track 1 is not hierarchical to the tracks collection/resource)

Question 8

See if the edit helps.

Question 9

I think I understand, but please correct me if I’m wrong. You are saying that I have 2 main options: 1. Use the ETag of GET /collection and do a batch update of all the resources with PUT /collection or 2. "communicate the information by hand", which means that each item in the collection is returned with an "etag" attribute that can be used for individual requests. What do you think of @imel96's answer? I'm trying really hard to only use the ETag HTTP header (without changing the representation) and at the same time avoid doing a batch update. I might be asking for too much hehe.

Question 10

Really, it's up to you how to use Etag. It's fine if you want to treat the resource as a tree where all leaves on the same branch to have the same Etag as the branch's ETag.

E.g. when you do GET /projects/small/ you get ETag: "xyzzy". Then when you do PUT /projects/small/1 you use the same ETag: "xyzzy". Just remember to update the branch's Etag if any of the leaves are modified. So,

GET /projects/small/ --> ETag: "xyzzy"
PUT /projects/small/1 ETag: "xyzzy" --> OK
PUT /projects/small/2 ETag: "xyzzy" --> Conflict

The second PUT must fail because the ETag has already been updated.

I should say, sharing Etag this way is only useful if there is only a few updates. If you are expecting more than a few updates, it's better to return links to the individual projects for GET /projects/small/ (instead of all entities under that URL), so the client will need to fetch the individual entity when they want to do an update.

Question 11

I definitely want to avoid fetching individual projects in order to update them. You offer a good solution, though. I wonder about the performance cost of reading the whole branch (instead of the leaf) in the DB and hashing the whole branch (instead of just the leaf) on each PUT and GET to set the ETag,

Question 12

@maximedupre Also depends on how strong the hash you want to be. In the RFC, I think it mentioned that Etag was needed because HTTP timestamp only has 1 second precision. So, instead of hashing the resource (which is large), you can just hash the last updated timestamp in microsecond together with the IP address of the client that made the update and the URL.

Question 13

Right, except I don't have a "last updated timestamp" on my resources. It could be an option to add it...

Question 14

Not sure how you handled this, but I think what you'd want to do is have your list of projects return ETags for each item in your collection. Then when you do a PUT, you would PUT with that ETag. First your collection GET:

GET /projects
HTTP/1.1 200 OK
ETag: unused
Body:
[ 
 {"id": "1", "name": "project1", "description": "fun porject", "etag": "abc"},
 {"id": "2", "name": "project2", "description": "another project", "etag": "def"},
]

The key point is that on a collection, the ETag on the HTTP response header is not applicable. It's only applicable on single items.

A GET on a single item at this point would return something like this.

GET /projects/1
HTTP/1.1 200 OK
ETag: abc
Body:
{"id": "1", "name": "project1", "description": "fun porject", "etag": "abc"},

And when you need to do a PUT, you would do something like this (to fix spelling error), using the If-Match header:

PUT /projects/1
If-Match: abc
Body:
{"description": "fun project"}
HTTP/1.1 200 OK
ETag: abc-v2

And now if you did a GET on that single item, it would look like this. Notice the ETag has changed because the underlying resource is different.

GET /projects/1
HTTP/1.1 200 OK
ETag: abc-v2
Body:
{"description": "fun project"}

And to round out the collection response:

GET /projects
HTTP/1.1 200 OK
ETag: unused
Body:
[ 
 {"id": "1", "name": "project1", "description": "fun project", "etag": "abc-v2"},
 {"id": "2", "name": "project2", "description": "another project", "etag": "def"},
]

Does this make sense?

See this link on PUT

score 3 · Accepted Answer · 2018-10-17 21:24:13Z

When doing GET /projects, the ETag corresponds to the hash of the collection. Now if I want to do PUT /projects/1, I need the hash of this specific resource for the conditional request (If-Match) to be successful. I could do GET /projects/{id} to get the individual hash for each resource, but it makes no sense; the collection service would become useless.

I think the problem is that HTTP doesn't mean what you want it to mean.

Fundamentally, the semantics of HTTP are that the resources are stored in a flat key value store. Although /collection and /collection/item are hierarchical identifiers (we can use relative resolution to get from one to the other), the resources that they identify are not hierarchical. There's no relationship inferred from the similar spelling of the identifiers.

This is why DELETE /collection doesn't do anything to your locally cached copy of /collection/item.

Because there is no inferred relationship between the collection and the item, there is no generic vector available for communicating the eTag of the item(s) in the meta-data for the collection.

You can certainly do either of

GET /collection
Conditional PUT /collection
GET /collection/item
Conditional PUT /collection/item

and the origin server can, at its discretion also change the representation of the other resource, as a side effect.

This isn't to say that you can't communicate the information by hand - there's nothing against the rules about returning a representation of the collection that communicates the appropriate representations of the member items, along with their validators, so that a "smart" client can create the correct requests without needing to get the individual items.

What do you mean by "the resources that they identify are not hierarchical"?

Disclaimer: all analogies are non-normative; what's real is what's described in the specifications.

The semantics of HTTP resources are not quite like those of a file system. For example, if we issue the following command on linux

rm -rf /collection

then one of the effects that we would expect is the removal of /collection/item. But that's not true of HTTP!

DELETE /collection

doesn't say anything at all about the resource /collection/item. It might be that when the server processes this request, the side effects might affect other resources. But HTTP isn't describing implementations, it is only assigning meaning to the messages. The meaning of the request message is constrained by the target resource only.

Another way of saying the same thing: as far as HTTP is concerned, none of these identifiers is "wrong" for an item in a collection.

/collection/item
/item/collection
/f5add126-65ef-4122-8657-03e672f159c4

Some of the server frameworks we use to implement our servers care; for instance, Rails has opinions on spelling. But those are really just implementation details behind the uniform interface.

So yes, in your domain model the project entities and the tracks entities may form a hierarchy, and you might choose spellings for the resource identifiers that reflect that hierarchy, but the semantics of HTTP are those of a flat key value store.

# Example #1: hierarchical key value store
echo ; cat <<EOF | python
d={}
d["/collection"]={}
d["/collection"]["/item"]=456
d.pop("/collection")
print(d)
EOF
{}
# Example #2: flat key value store
echo ; cat <<EOF | python
d={}
d["/collection"]={}
d["/collection/item"]=456
d.pop("/collection")
print(d)
EOF
{'/collection/item': 456}

HTTP acts like the second example.

I think I don't have enough knowledge to understand your answer. There are multiple things I don't understand, but let's start with one. What do you mean by "the resources that they identify are not hierarchical"? In this url, are the project resource and tracks resource hierarchical "/projects/1/tracks/1"? In the application and in the database the track and project resources are hierarchical. Or are you simply saying that the relationship between collection/item is not hierarchical (e.g. track 1 is hierarchical to project 1, but track 1 is not hierarchical to the tracks collection/resource)
I think I understand, but please correct me if I’m wrong. You are saying that I have 2 main options: 1. Use the ETag of GET /collection and do a batch update of all the resources with PUT /collection or 2. "communicate the information by hand", which means that each item in the collection is returned with an "etag" attribute that can be used for individual requests. What do you think of @imel96's answer? I'm trying really hard to only use the ETag HTTP header (without changing the representation) and at the same time avoid doing a batch update. I might be asking for too much hehe.

Stack Exchange Network

How to handle "Optimistic Locking" on a collection with ETag headers?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to handle "Optimistic Locking" on a collection with ETag headers?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions