What would be better DB design for a social network website. A single table with more columns and less rows or multiple table with fewer columns but more rows.
For example: User can post an update on their wall or in a group.
Two DB designs that I could think of are:
1) UserPosts: id, userId, post, datetime UserGroupPost: id, groupId, userId, post, datetime
Potentail problem : Might require joins, which can (in future) be a slow query.
2) Posts: id, userId, groupId, post, datetime (where groupid would be null if user posts on their wall)
Potential Problem : Looping over large dataset could take a (long)time.
Where can I get better performance when data increases? Is there any other(better) way?
-
1First normal form and Second normal form may be what are you looking for.Arseni Mourzenko– Arseni Mourzenko2015年06月23日 12:10:40 +00:00Commented Jun 23, 2015 at 12:10
-
better to ask on the DBAs SE sitegbjbaanb– gbjbaanb2015年06月23日 12:30:59 +00:00Commented Jun 23, 2015 at 12:30
-
1Is starting a greenfield app with a non-normalized database a case of PrematureOptimization ?k3b– k3b2015年06月23日 12:33:46 +00:00Commented Jun 23, 2015 at 12:33
-
Every time you de-nnormalize a schema you should know that you are giving some flexibility away. Database design optimization requires complete analysis and vision including future requirements. In many cases you can buy performance but you can rarely buy flexibility without shutterning large part of the code. I suggest you begin with a 3NF first.– Emmad Kareem 8 hours agoNoChance– NoChance2015年06月23日 21:41:49 +00:00Commented Jun 23, 2015 at 21:41
3 Answers 3
Any thing ("Entity") that can exist on its own, independently of anything else, should have its own table.
User: id, name, hashed_password, join_date, birth_date
Group: id, name
Relationships between things require generally require "linking" tables.
Post: id, user_id, group_id, post_date, post_title, post_content
The key to success is proper indexing of any field where you join between tables or on which you filter results.
Also, consider using a dummy (non-NULL) Group value for posts to a user's own "wall" - NULLs are often not included in indexes, which will make your queries for these posts run [far] slower.
-
In my opinion, the post should be independent of its scope, so the post purpose (Group vs. Wall) should be a separate table. Also the dependency between user_id and group_id could be avoided so that user could switch groups or be part of several groups at the same time.NoChance– NoChance2015年06月23日 12:38:52 +00:00Commented Jun 23, 2015 at 12:38
-
Instead of NULLs we can have some other value (0 maybe). How can we avoid a potentially slow query?Siddharth Patel– Siddharth Patel2015年06月24日 05:50:24 +00:00Commented Jun 24, 2015 at 5:50
Joins are not slow. They are incredible fast if you join to a primary key or indexed column. You should not make design decision from the assumption that joining is a problem.
Now, there may be particular cases e.g. with very large datasets or distributed databases, where joining may be a performance problem, and there are various ways to mitigate that (indexed views, denormalization, caching), but since you don't mention that you have these specific issues, I would guess it would be premature optimization to think about.
Unless you have very specific issues, you should design your data normalized, and then use indexes etc. to avoid performance problems.
I would go even further:
User: Id, Name, WallId
Post: Id, Content, Title, WallId etc...
Wall: WallId
Group: Id, Name, WallId...
Basically, post belongs to a Wall. Wall can belong to a user or a group.