Consider endpoint /projects
that returns a list of projects with the following headers:
HTTP/1.1 200 OK
Etag: "superEtag"
The etag value represents a hash of the entire collection and it does not allow a client to update a single resource e.g. /projects/1
.
Fetching the resources individually makes no sense, so how can I handle optimistic locking with a collection?
3 Answers 3
When doing GET /projects, the ETag corresponds to the hash of the collection. Now if I want to do PUT /projects/1, I need the hash of this specific resource for the conditional request (If-Match) to be successful. I could do GET /projects/{id} to get the individual hash for each resource, but it makes no sense; the collection service would become useless.
I think the problem is that HTTP doesn't mean what you want it to mean.
Fundamentally, the semantics of HTTP are that the resources are stored in a flat key value store. Although /collection
and /collection/item
are hierarchical identifiers (we can use relative resolution to get from one to the other), the resources that they identify are not hierarchical. There's no relationship inferred from the similar spelling of the identifiers.
This is why DELETE /collection
doesn't do anything to your locally cached copy of /collection/item
.
Because there is no inferred relationship between the collection and the item, there is no generic vector available for communicating the eTag of the item(s) in the meta-data for the collection.
You can certainly do either of
GET /collection
Conditional PUT /collection
GET /collection/item
Conditional PUT /collection/item
and the origin server can, at its discretion also change the representation of the other resource, as a side effect.
This isn't to say that you can't communicate the information by hand - there's nothing against the rules about returning a representation of the collection that communicates the appropriate representations of the member items, along with their validators, so that a "smart" client can create the correct requests without needing to get the individual items.
What do you mean by "the resources that they identify are not hierarchical"?
Disclaimer: all analogies are non-normative; what's real is what's described in the specifications.
The semantics of HTTP resources are not quite like those of a file system. For example, if we issue the following command on linux
rm -rf /collection
then one of the effects that we would expect is the removal of /collection/item
. But that's not true of HTTP!
DELETE /collection
doesn't say anything at all about the resource /collection/item
. It might be that when the server processes this request, the side effects might affect other resources. But HTTP isn't describing implementations, it is only assigning meaning to the messages. The meaning of the request message is constrained by the target resource only.
Another way of saying the same thing: as far as HTTP is concerned, none of these identifiers is "wrong" for an item in a collection.
/collection/item
/item/collection
/f5add126-65ef-4122-8657-03e672f159c4
Some of the server frameworks we use to implement our servers care; for instance, Rails has opinions on spelling. But those are really just implementation details behind the uniform interface.
So yes, in your domain model the project entities and the tracks entities may form a hierarchy, and you might choose spellings for the resource identifiers that reflect that hierarchy, but the semantics of HTTP are those of a flat key value store.
# Example #1: hierarchical key value store
echo ; cat <<EOF | python
d={}
d["/collection"]={}
d["/collection"]["/item"]=456
d.pop("/collection")
print(d)
EOF
{}
# Example #2: flat key value store
echo ; cat <<EOF | python
d={}
d["/collection"]={}
d["/collection/item"]=456
d.pop("/collection")
print(d)
EOF
{'/collection/item': 456}
HTTP acts like the second example.
-
I think I don't have enough knowledge to understand your answer. There are multiple things I don't understand, but let's start with one. What do you mean by "the resources that they identify are not hierarchical"? In this url, are the project resource and tracks resource hierarchical "/projects/1/tracks/1"? In the application and in the database the track and project resources are hierarchical. Or are you simply saying that the relationship between collection/item is not hierarchical (e.g. track 1 is hierarchical to project 1, but track 1 is not hierarchical to the tracks collection/resource)Maxime Dupré– Maxime Dupré10/18/2018 00:05:18Commented Oct 18, 2018 at 0:05
-
See if the edit helps.VoiceOfUnreason– VoiceOfUnreason10/18/2018 13:51:37Commented Oct 18, 2018 at 13:51
-
I think I understand, but please correct me if I’m wrong. You are saying that I have 2 main options: 1. Use the ETag of
GET /collection
and do a batch update of all the resources withPUT /collection
or 2. "communicate the information by hand", which means that each item in the collection is returned with an "etag" attribute that can be used for individual requests. What do you think of @imel96's answer? I'm trying really hard to only use the ETag HTTP header (without changing the representation) and at the same time avoid doing a batch update. I might be asking for too much hehe.Maxime Dupré– Maxime Dupré10/18/2018 19:58:33Commented Oct 18, 2018 at 19:58
Really, it's up to you how to use Etag. It's fine if you want to treat the resource as a tree where all leaves on the same branch to have the same Etag as the branch's ETag.
E.g. when you do GET /projects/small/
you get ETag: "xyzzy". Then when you do PUT /projects/small/1
you use the same ETag: "xyzzy". Just remember to update the branch's Etag if any of the leaves are modified. So,
GET /projects/small/ --> ETag: "xyzzy"
PUT /projects/small/1 ETag: "xyzzy" --> OK
PUT /projects/small/2 ETag: "xyzzy" --> Conflict
The second PUT
must fail because the ETag has already been updated.
I should say, sharing Etag this way is only useful if there is only a few updates. If you are expecting more than a few updates, it's better to return links to the individual projects for GET /projects/small/
(instead of all entities under that URL), so the client will need to fetch the individual entity when they want to do an update.
-
I definitely want to avoid fetching individual projects in order to update them. You offer a good solution, though. I wonder about the performance cost of reading the whole branch (instead of the leaf) in the DB and hashing the whole branch (instead of just the leaf) on each PUT and GET to set the ETag,Maxime Dupré– Maxime Dupré10/17/2018 23:53:02Commented Oct 17, 2018 at 23:53
-
@maximedupre Also depends on how strong the hash you want to be. In the RFC, I think it mentioned that Etag was needed because HTTP timestamp only has 1 second precision. So, instead of hashing the resource (which is large), you can just hash the last updated timestamp in microsecond together with the IP address of the client that made the update and the URL.imel96– imel9610/18/2018 01:47:59Commented Oct 18, 2018 at 1:47
-
Right, except I don't have a "last updated timestamp" on my resources. It could be an option to add it...Maxime Dupré– Maxime Dupré10/18/2018 19:45:26Commented Oct 18, 2018 at 19:45
Not sure how you handled this, but I think what you'd want to do is have your list of projects return ETags for each item in your collection. Then when you do a PUT
, you would PUT
with that ETag
. First your collection GET
:
GET /projects
HTTP/1.1 200 OK
ETag: unused
Body:
[
{"id": "1", "name": "project1", "description": "fun porject", "etag": "abc"},
{"id": "2", "name": "project2", "description": "another project", "etag": "def"},
]
The key point is that on a collection, the ETag on the HTTP response header is not applicable. It's only applicable on single items.
A GET
on a single item at this point would return something like this.
GET /projects/1
HTTP/1.1 200 OK
ETag: abc
Body:
{"id": "1", "name": "project1", "description": "fun porject", "etag": "abc"},
And when you need to do a PUT, you would do something like this (to fix spelling error), using the If-Match
header:
PUT /projects/1
If-Match: abc
Body:
{"description": "fun project"}
HTTP/1.1 200 OK
ETag: abc-v2
And now if you did a GET
on that single item, it would look like this. Notice the ETag
has changed because the underlying resource is different.
GET /projects/1
HTTP/1.1 200 OK
ETag: abc-v2
Body:
{"description": "fun project"}
And to round out the collection response:
GET /projects
HTTP/1.1 200 OK
ETag: unused
Body:
[
{"id": "1", "name": "project1", "description": "fun project", "etag": "abc-v2"},
{"id": "2", "name": "project2", "description": "another project", "etag": "def"},
]
Does this make sense?
Explore related questions
See similar questions with these tags.
GET /projects
, the ETag corresponds to the hash of the collection. Now if I want to doPUT /projects/1
, I need the hash of this specific resource for the conditional request (If-Match) to be successful. I could doGET /projects/{id}
to get the individual hash for each resource, but it makes no sense; the collection service would become useless. I hope this is clearer.