Regarding data modeling in a NoSQL use case

Question 1

I am reading the following link to learn about system designs of various systems. (This is a paid link, I am attaching all explanations below.) In an attempt to explain the system design of Instagram the above link gives us the following requirements:

Users should be able to upload/download/view photos.
Users can perform searches based on photo/video titles.
Users can follow other users.
The system should be able to generate and display a user’s News Feed consisting of top photos from all the people the user follows

It suggests us to have 3 schemas enter image description here

These schemas suit an RDBMS well.

The link says, if we want to scale up we need to tap into the benefits of NoSQL, and for this, we will need to create another table UserPhoto in which, the ‘key’ would be ‘UserID’ and the ‘value’ would be the list of ‘PhotoIDs’ the user owns, stored in different columns.

In the link, they have suggested the use of Cassandra NoSQL database for this use case.

To fulfill the requirement of Generating a user's newsfeed by aggregating the top 100 photos of all the users followed by the current user, the link suggests us to have our PhotoID comprise of a number and timestamp of photo upload (epoch time) because we have a primary index on photo Id in Photo table. This is the point where it got confusing for me.

How will we use the above schema to get 100 latest photos of all the users followed by the current user in a NoSQL database?

Question 2

I'm unsure of what might constitute a best case. But what they are talking about is a Compound Identifier.

So for example, photos are each added uniquely by a user, at a unique (or near enough) datetime. So you could theoretical refer to the photo as "User X's photo uploaded at A/B/C d:e:f" or in a small string "623456_20190403174521" if the user id was "623456" and the date was April the 3rd, 2019 at 5:45:21pm.

A compound identifier is beneficial in that it composes a lot of information into a single field, or key number. If that field is in a sorted index, you can quickly identify the relevant records without first opening them up. eg: finding all photos on a day could be as simple as retrieving all records between :"623456_20190403000000" and "623456_20190403235959". In your case the last 100 photos is as simple as select the top 100 entries with the id matching "623456_*".

A compound identifier is also problematic. If other systems rely on it, you can't change the format. Also if the data is only stored in the identifier, its harder to individually access the pieces. Conversely if the data in the identifier is also duplicated inside the record in separate fields and they don't match, which is correct?

Is this the right way to go? Probably not.

Is it the only reasonable way to go? That depends on the database engine and if there is a better way to achieve the same result.

Question 3

I think you are right, except, if yiu have that extra nosql document UserPhotos you will be able to sort by date and merge with other UserPhotos just using the key from the doc. Not aure why this is better than jist adding more metadata ie a date field to the same doc though

Kain0_0 Kain0_0 16.6k19 silver badges40 bronze badges · Answer 1 · 2019-01-11 03:48:19Z

I'm unsure of what might constitute a best case. But what they are talking about is a Compound Identifier.

So for example, photos are each added uniquely by a user, at a unique (or near enough) datetime. So you could theoretical refer to the photo as "User X's photo uploaded at A/B/C d:e:f" or in a small string "623456_20190403174521" if the user id was "623456" and the date was April the 3rd, 2019 at 5:45:21pm.

A compound identifier is beneficial in that it composes a lot of information into a single field, or key number. If that field is in a sorted index, you can quickly identify the relevant records without first opening them up. eg: finding all photos on a day could be as simple as retrieving all records between :"623456_20190403000000" and "623456_20190403235959". In your case the last 100 photos is as simple as select the top 100 entries with the id matching "623456_*".

A compound identifier is also problematic. If other systems rely on it, you can't change the format. Also if the data is only stored in the identifier, its harder to individually access the pieces. Conversely if the data in the identifier is also duplicated inside the record in separate fields and they don't match, which is correct?

Is this the right way to go? Probably not.

Is it the only reasonable way to go? That depends on the database engine and if there is a better way to achieve the same result.

I think you are right, except, if yiu have that extra nosql document UserPhotos you will be able to sort by date and merge with other UserPhotos just using the key from the doc. Not aure why this is better than jist adding more metadata ie a date field to the same doc though

Stack Exchange Network

Regarding data modeling in a NoSQL use case

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Regarding data modeling in a NoSQL use case

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions