6

I want to design my relational schema to store information about users following other users. So if user1 is following user2 the schema will be as below

user_follow
------------
|user1|user2| -> Column names
------------
|U_Nm1|U_Nm2| -> tuple
------------

So in this way if user1 is following 100 other users I will have 100 rows just for a single user. If I have lots of users following lots of other users my table size will grow enormously. Is there any other way I can design this? I know it is better to use NoSQL for such requirements but I am restricted to use Relational DB.

asked Jul 11, 2015 at 18:23
2
  • 3
    There's no need to use NoSQL for this. Using a second table to store relationships between entities in the first table (as Thomas' answer explains) is a classic example of database normalization, and normalization is almost always what you want in a relational DB. Commented Jul 11, 2015 at 18:47
  • 1
    The overhead is quite limited in this example. The design stores 2 user IDs instead of the 1 that is conceptually necessary. I wouldn't worry about it. Commented Jul 11, 2015 at 18:56

2 Answers 2

13

I know it is better to use NoSQL for such requirements

I'm not sure why you'd think that. After all, question data is very relational in nature.

Here's what I'd do:

User table

  • UserId (primary key)
  • UserName
  • ... etc ...

Following table (join table)

  • FollowingUserId (foreign key to UserTable.UserId)
  • FollowedUserId (foreign key to UserTable.UserId)

Utilizing this join table you'll be able to create a "following" relationship between users. And sure, if a user follows 100 users, then yes there will be 100 rows in the join table for that single user. That is small in reality, though, and with optimizations there would be negligible impact.

answered Jul 11, 2015 at 18:34
4
  • Thanks Thomas! Can you elaborate the optimization? Actually I am thinking of caching the data so as to avoid DB communication all the time. Please let me know if I there is anything else I can do to improve the performance. Commented Jul 11, 2015 at 20:50
  • 4
    @karanratnaparkhi: Don't start trying to do optimizations until you can point at something being a problem. Design your schema to take advantage of the database's features and let it do its job. The people who wrote most of the big commercial and open source RDBMSes are very good at what they do. Commented Jul 11, 2015 at 21:29
  • What family of NoSQL you mean? Most of them don't support joins and they tell you that you can denormalise, which you could do in SQL with maintaining SW correctness Commented Jul 11, 2015 at 22:36
  • The immediate optimizations that I'm talking about are things like indexing the foreign key data, as you know that'll by joining on this column you'll benefit from this type of thing. Don't go crazy adding useless indexes without proof, but there are some that are good "by default". Commented Jul 12, 2015 at 12:46
-2

With respect to good design, I would suggest you should use two subclasses to Users; Followers and Followed. If you draw an ER diagram, this approach is more semantically richer.

So the schemas would be:

  1. User (UserName, Email, etc)
  2. FollowerUser (FollowerName, FollowerEmail, etc)
  3. FollowedUser (FollowedName, FollowedEmail, etc)
  4. Following (FollowerEmail, FollowedEmail)

One of the advantages of such design is that Users who are not Following or being Followed by anyone are now more easily accessible.

Moreover, database normalization or optimization is required when you want to remove redundancy. In this case:

User1 follows User2 is different from User1 follows User3, since each row represents a new relationship. Hence the data is not redundant and normalization is not required.

answered Jul 12, 2015 at 3:58
1
  • 1
    This is a terribly denormalized setup. Why are you duplicating User twice (FollowerUser and FollowedUser)? Also, using a VARCHAR column as a foreign key (and therefore primary key) tanks performance. It's also a potentially volatile field, which you don't want for a PK. Since there is no difference between a user that follows and one who is followed, it is not "semantically richer" to duplicate that information. Also, users who are not being followed/following is an easy WHERE EXISTS or WHERE NOT EXISTS query away. Commented Jul 12, 2015 at 4:20

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.