Is it a good idea for an API to return only ids from objects?

Question 1

I have this URL:

 /api/pallets/list

Which returns a JSON array that looks like this:

 [{
 palletId: 333,
 code: 'J050000081',
 grower: {
 growerId: 35,
 name: 'Grower Of Blueberries Inc'
 },
 species: {
 speciesId: 1,
 name: 'Blueberries'
 },
 caliber: {
 caliberId: 5,
 name: '10-12'
 },
 }, ...]

Names are often large and if the list contains 5000 pallets that is a lot of bytes in names.

When the client app calls api/pallets/list it has previously already downloaded the list of growers, species and calibers, by calling api/growers/list, api/species/list, and api/calibers/list

Because of that, I'm wondering if it is a good idea that the server returns only the ids of things, ie:

 [{
 palletId: 333,
 code: 'J050000081',
 grower: {
 growerId: 35,
 },
 species: {
 speciesId: 1,
 },
 caliber: {
 caliberId: 5,
 },
 }, ...]

And then the client app will have the responsibility of completing the JSON, by doing something like this:

 // Pseudocode
 // Just after fetching from api/pallets/list
 foreach pallet in clientApp.pallets {
 pallet.grower = clientApp.growers[pallet.growerId]
 pallet.species = clientApp.species[pallet.speciesId]
 pallet.caliber = clientApp.calibers[pallet.caliberId]
 }
 // growers is a dictionary with all the growers already downloaded from the server
 // species is a dictionary with all the species already downloaded from the server
 // calibers idem

I want to know if this is a good or bad idea for improving performance. Is there a name for this practice?

The code would be much more cleaner without a change like this but this 5000 pallets jarray is too heavy. In the example I'm only putting 3 fields (grower, species, caliber) but in reality there are like 10. All of them have id + name + other subfields...

Question 2

Looks like you need pagination here. Whether the pallets just return IDs or the whole representantion, 5k rows are still too much rows (IMO).

Question 3

As an aside, instead of using plain IDs, it fits the REST ideas better to communicate URIs. Then the client doesn't need to have knowledge that api/growers/list and api/growers/35 are related and can be derived from each other, or that api/growers/ needs to be added in front of a growers-ID to access that resource..

Question 4

If client side doesn't need to know all the info of the nested objects inmediately, that's a good approach too.

Question 5

I want to know if this is a good or bad idea for improving performance. Is there a name for this practice?

Depends on your requirements and the issues of performance to fix. Make yourself the next question: Do I really have an issue with the response size?

If the data changes the client remains unaware of the changes. So you have two options here:

Periodic synchronisations
Reload the data stored locally and iterate all over the 5k rows to retrieve the nested objects.

But, if you have to reload all the data, where are the savings?

Unless you are concerned about real bandwidth or data plan constraints, I would not care prematurely about the size of the response. Instead, I would enhance the API RESt itself

Pagination

/api/pallets/list?page=0&pageSize=500

Dynamic representations

/api/pallets/list?fields=id,name,growers.name,species.name

_{We will find battle-tested solutions such as GrapqQL or OData to this end.}

Mix up

/api/pallets/list?fields=id;name;growers.name;species.name&page=0&pageSize=500

Etag

We could enhance the solution with ETag.

If you can't afford the pagination, the dynamic representations may help. ETag is just a plus in any scenario.

All of the above approaches improve client-side performance but the server suffers a load increase ¹. However, it's cheaper and easier to scale up|out the server than the client.

^{1: ETag is addressed to save bandwidth not to reduce the calls to the server.}

Question 6

I've thought in the past about the fields idea! but I'm not very sure on how to implement it the right way. Pagination is also a possibility but I'd lose some features in the client side (these pallets are displayed in a grid that lets the user do client-side pivoting, aggregation, filtering, sorting, etc)

Question 7

I see. I have had to deal with the same requirement. Pagination, filtering and sorting in client side. Maybe, working with 5K rows is not the best for the client. Today are 5k tomorrow who knows. Evaluate the possibility of revamping the way to work with the pallets. I asure you that client side is also sensible to so heavy arrays. For instance, if you use Chrome, heavy processes can block the whole pc :-) or slow down everything else. Browsers are resource eaters thesedays.

Question 8

Today I'm using ag-grid (ag-grid.com/example.php) for displaying the pallets so I'm quite relaxed with the client-side performance (in the grid demo you can try 100,000 rows and 22 columns and it works super fine). But as you said: I need to ask myself the qusetion: "Do I really have an issue with the response data size?". Today my api implementation is slow, and it is in part because of the data size and maybe also because of the SQL queries :-/

Question 9

Well that's a good point to start with. Querying and backend performance. stackoverflow.com/q/6302131/5934037

Question 10

You were right with this: "Do I really have an issue with the response data size?" -- The answer is: I don't. The JSON is gzip compressed and I wasn't aware of that. It seems that my main performance issue is between the API and the SQL Server...

Question 11

To be honest, I'd be totally annoyed with you as a user of that API.

Every single request to an API can fail. If it fails, I have to do something about it. With the original request, there are two possible outcomes: Either I have all the data that I want, or I have nothing. That's very easy to handle. With your second approach, ANY subset of the information that I want my be missing. That is an order of magnitude more difficult to handle.

And Laiv's recommendation of letting the user choose which fields they want is quite useful.

What I found absolutely weird was this:

 grower: {
 growerId: 35,
 },

which should have been

 growerId: 35,

and nothing else.

Question 12

Might look weird but if the client-side is going to "fill" the sub-object "grower" with the entire "grower" then it is not that weird. I'm talking about this line: pallet.grower = clientApp.growers[pallet.growerId]

Question 13

That's what I said. As it is, the client-side has to write "pallet.grower = clientApp.growers[pallet.grower.growerId]".

Laiv Laiv 15k2 gold badges34 silver badges71 bronze badges · Answer 1 · 2017-05-14 11:10:25Z

I want to know if this is a good or bad idea for improving performance. Is there a name for this practice?

Depends on your requirements and the issues of performance to fix. Make yourself the next question: Do I really have an issue with the response size?

If the data changes the client remains unaware of the changes. So you have two options here:

Periodic synchronisations
Reload the data stored locally and iterate all over the 5k rows to retrieve the nested objects.

But, if you have to reload all the data, where are the savings?

Unless you are concerned about real bandwidth or data plan constraints, I would not care prematurely about the size of the response. Instead, I would enhance the API RESt itself

Pagination

/api/pallets/list?page=0&pageSize=500

Dynamic representations

/api/pallets/list?fields=id,name,growers.name,species.name

_{We will find battle-tested solutions such as GrapqQL or OData to this end.}

Mix up

/api/pallets/list?fields=id;name;growers.name;species.name&page=0&pageSize=500

Etag

We could enhance the solution with ETag.

If you can't afford the pagination, the dynamic representations may help. ETag is just a plus in any scenario.

All of the above approaches improve client-side performance but the server suffers a load increase ¹. However, it's cheaper and easier to scale up|out the server than the client.

^{1: ETag is addressed to save bandwidth not to reduce the calls to the server.}

I've thought in the past about the fields idea! but I'm not very sure on how to implement it the right way. Pagination is also a possibility but I'd lose some features in the client side (these pallets are displayed in a grid that lets the user do client-side pivoting, aggregation, filtering, sorting, etc)
I see. I have had to deal with the same requirement. Pagination, filtering and sorting in client side. Maybe, working with 5K rows is not the best for the client. Today are 5k tomorrow who knows. Evaluate the possibility of revamping the way to work with the pallets. I asure you that client side is also sensible to so heavy arrays. For instance, if you use Chrome, heavy processes can block the whole pc :-) or slow down everything else. Browsers are resource eaters thesedays.
Today I'm using ag-grid (ag-grid.com/example.php) for displaying the pallets so I'm quite relaxed with the client-side performance (in the grid demo you can try 100,000 rows and 22 columns and it works super fine). But as you said: I need to ask myself the qusetion: "Do I really have an issue with the response data size?". Today my api implementation is slow, and it is in part because of the data size and maybe also because of the SQL queries :-/
Well that's a good point to start with. Querying and backend performance. stackoverflow.com/q/6302131/5934037
You were right with this: "Do I really have an issue with the response data size?" -- The answer is: I don't. The JSON is gzip compressed and I wasn't aware of that. It seems that my main performance issue is between the API and the SQL Server...

gnasher729 gnasher729 49.3k4 gold badges71 silver badges137 bronze badges · Answer 2 · 2017-05-14 14:41:58Z

To be honest, I'd be totally annoyed with you as a user of that API.

Every single request to an API can fail. If it fails, I have to do something about it. With the original request, there are two possible outcomes: Either I have all the data that I want, or I have nothing. That's very easy to handle. With your second approach, ANY subset of the information that I want my be missing. That is an order of magnitude more difficult to handle.

And Laiv's recommendation of letting the user choose which fields they want is quite useful.

What I found absolutely weird was this:

 grower: {
 growerId: 35,
 },

which should have been

 growerId: 35,

and nothing else.

Might look weird but if the client-side is going to "fill" the sub-object "grower" with the entire "grower" then it is not that weird. I'm talking about this line: pallet.grower = clientApp.growers[pallet.growerId]
That's what I said. As it is, the client-side has to write "pallet.grower = clientApp.growers[pallet.grower.growerId]".

Stack Exchange Network

Is it a good idea for an API to return only ids from objects?

2 Answers 2

Pagination

Dynamic representations

Mix up

Etag

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Is it a good idea for an API to return only ids from objects?

2 Answers 2

Pagination

Dynamic representations

Mix up

Etag

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions