How to design a relational database for user following other users?

Question 1

I want to design my relational schema to store information about users following other users. So if user1 is following user2 the schema will be as below

user_follow
------------
|user1|user2| -> Column names
------------
|U_Nm1|U_Nm2| -> tuple
------------

So in this way if user1 is following 100 other users I will have 100 rows just for a single user. If I have lots of users following lots of other users my table size will grow enormously. Is there any other way I can design this? I know it is better to use NoSQL for such requirements but I am restricted to use Relational DB.

Question 2

There's no need to use NoSQL for this. Using a second table to store relationships between entities in the first table (as Thomas' answer explains) is a classic example of database normalization, and normalization is almost always what you want in a relational DB.

Question 3

The overhead is quite limited in this example. The design stores 2 user IDs instead of the 1 that is conceptually necessary. I wouldn't worry about it.

Question 4

I know it is better to use NoSQL for such requirements

I'm not sure why you'd think that. After all, question data is very relational in nature.

Here's what I'd do:

User table

UserId (primary key)
UserName
... etc ...

Following table (join table)

FollowingUserId (foreign key to UserTable.UserId)
FollowedUserId (foreign key to UserTable.UserId)

Utilizing this join table you'll be able to create a "following" relationship between users. And sure, if a user follows 100 users, then yes there will be 100 rows in the join table for that single user. That is small in reality, though, and with optimizations there would be negligible impact.

Question 5

Thanks Thomas! Can you elaborate the optimization? Actually I am thinking of caching the data so as to avoid DB communication all the time. Please let me know if I there is anything else I can do to improve the performance.

Question 6

@karanratnaparkhi: Don't start trying to do optimizations until you can point at something being a problem. Design your schema to take advantage of the database's features and let it do its job. The people who wrote most of the big commercial and open source RDBMSes are very good at what they do.

Question 7

What family of NoSQL you mean? Most of them don't support joins and they tell you that you can denormalise, which you could do in SQL with maintaining SW correctness

Question 8

The immediate optimizations that I'm talking about are things like indexing the foreign key data, as you know that'll by joining on this column you'll benefit from this type of thing. Don't go crazy adding useless indexes without proof, but there are some that are good "by default".

Question 9

With respect to good design, I would suggest you should use two subclasses to Users; Followers and Followed. If you draw an ER diagram, this approach is more semantically richer.

So the schemas would be:

User (UserName, Email, etc)
FollowerUser (FollowerName, FollowerEmail, etc)
FollowedUser (FollowedName, FollowedEmail, etc)
Following (FollowerEmail, FollowedEmail)

One of the advantages of such design is that Users who are not Following or being Followed by anyone are now more easily accessible.

Moreover, database normalization or optimization is required when you want to remove redundancy. In this case:

User1 follows User2 is different from User1 follows User3, since each row represents a new relationship. Hence the data is not redundant and normalization is not required.

Question 10

This is a terribly denormalized setup. Why are you duplicating User twice (FollowerUser and FollowedUser)? Also, using a VARCHAR column as a foreign key (and therefore primary key) tanks performance. It's also a potentially volatile field, which you don't want for a PK. Since there is no difference between a user that follows and one who is followed, it is not "semantically richer" to duplicate that information. Also, users who are not being followed/following is an easy WHERE EXISTS or WHERE NOT EXISTS query away.

score 13 · Answer 1 · 2015-07-11 18:34:34Z

13

I know it is better to use NoSQL for such requirements

I'm not sure why you'd think that. After all, question data is very relational in nature.

Here's what I'd do:

User table

UserId (primary key)
UserName
... etc ...

Following table (join table)

FollowingUserId (foreign key to UserTable.UserId)
FollowedUserId (foreign key to UserTable.UserId)

Utilizing this join table you'll be able to create a "following" relationship between users. And sure, if a user follows 100 users, then yes there will be 100 rows in the join table for that single user. That is small in reality, though, and with optimizations there would be negligible impact.

Share

Improve this answer

edited Jul 11, 2015 at 19:21

answered Jul 11, 2015 at 18:34

Thomas Stringer's user avatar

Thomas Stringer Thomas Stringer

2,2472 gold badges18 silver badges19 bronze badges

4

Thanks Thomas! Can you elaborate the optimization? Actually I am thinking of caching the data so as to avoid DB communication all the time. Please let me know if I there is anything else I can do to improve the performance.

karan ratnaparkhi
– karan ratnaparkhi

2015年07月11日 20:50:13 +00:00
Commented Jul 11, 2015 at 20:50
4

@karanratnaparkhi: Don't start trying to do optimizations until you can point at something being a problem. Design your schema to take advantage of the database's features and let it do its job. The people who wrote most of the big commercial and open source RDBMSes are very good at what they do.

Blrfl
– Blrfl

2015年07月11日 21:29:06 +00:00
Commented Jul 11, 2015 at 21:29
What family of NoSQL you mean? Most of them don't support joins and they tell you that you can denormalise, which you could do in SQL with maintaining SW correctness

Sleiman Jneidi
– Sleiman Jneidi

2015年07月11日 22:36:07 +00:00
Commented Jul 11, 2015 at 22:36
The immediate optimizations that I'm talking about are things like indexing the foreign key data, as you know that'll by joining on this column you'll benefit from this type of thing. Don't go crazy adding useless indexes without proof, but there are some that are good "by default".

Thomas Stringer
– Thomas Stringer

2015年07月12日 12:46:35 +00:00
Commented Jul 12, 2015 at 12:46

Add a comment |

Asad Imtiaz Butt Asad Imtiaz Butt 1 · Answer 2 · 2015-07-12 03:58:00Z

With respect to good design, I would suggest you should use two subclasses to Users; Followers and Followed. If you draw an ER diagram, this approach is more semantically richer.

So the schemas would be:

User (UserName, Email, etc)
FollowerUser (FollowerName, FollowerEmail, etc)
FollowedUser (FollowedName, FollowedEmail, etc)
Following (FollowerEmail, FollowedEmail)

One of the advantages of such design is that Users who are not Following or being Followed by anyone are now more easily accessible.

Moreover, database normalization or optimization is required when you want to remove redundancy. In this case:

User1 follows User2 is different from User1 follows User3, since each row represents a new relationship. Hence the data is not redundant and normalization is not required.

This is a terribly denormalized setup. Why are you duplicating User twice (FollowerUser and FollowedUser)? Also, using a VARCHAR column as a foreign key (and therefore primary key) tanks performance. It's also a potentially volatile field, which you don't want for a PK. Since there is no difference between a user that follows and one who is followed, it is not "semantically richer" to duplicate that information. Also, users who are not being followed/following is an easy WHERE EXISTS or WHERE NOT EXISTS query away.

Stack Exchange Network

How to design a relational database for user following other users?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to design a relational database for user following other users?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions