Generally speaking, the schema looks good and well organized, however a few columns could use some refactoring. Here is what I can point to.
Users table
'username' VARCHAR(25) NOT NULL,
This seems a bit small for a user name. While I'm pretty sure most users will choose names below 25 characters, I would expand it a bit just for the sake of safety. I think 50 or so will do.
'password' VARCHAR(255) NOT NULL,
This smells at first sight. Maybe I'm wrong, but this sounds as a plain-text password, which is utterly wrong. Column size should match the output size of the used hash function, plus the size of the salt and hash parameters (or split those into separate columns). And don't even think of SHA1 or MD5 :P Use PBKDF2, bcrypt or scrypt for this. Look here for a more detailed discussion of password hashing.
'first_name' VARCHAR(255) NOT NULL,
'last_name' VARCHAR(255) NOT NULL,
This design is very specific for most occidental names of people, but wrong in the general case. People's names in many part of the world don't fit into first/last name, and for the most part a single, longer column will do. Read this article for more details about this.
Additionally, do you really need the real name of the user? Doesn't the username suffices? Or the real name is to be displayed somewhere? All in all, I would put it as
'real_name' VARCHAR(1000) NOT NULL,
'gender' ENUM('M', 'F'),
In many case, and specially in a social network environment (and more if you will allow search by gender), that restriction is not enough, as human gender identity goes far beyond male/female. Just for the record, New York recognizes 31 and Facebook 71, so I think your DB should at least take that into consideration.
What I would do is separate the possible genders into its own table (that you can fill separately) and put a foreign key in users:
CREATE TABLE Genders (
'gender_id' INT(20) NOT NULL AUTO_INCREMENT,
'name' VARCHAR(100) NOT NULL,
PRIMARY KEY ('gender_id'),
UNIQUE KEY ('name')
);
And in users
:
'gender_id' INT(20),
FOREIGN KEY ('gender_id') REFERENCES Genders('gender_id')
'city' VARCHAR(255),
'state' VARCHAR(255),
'country' VARCHAR(255),
Generally, this is a bad practice, as it would lead to repeated and inconsistent data, and difficult searches by city/country (as anyone can write it the way he wants). A better way is to normalize this into separate tables for cities/states/countries, and leave a reference to the city in the users table (and in turn, a city implies a state and a country):
CREATE TABLE Countries (
'country_id' INT(20) NOT NULL AUTO_INCREMENT,
'name' VARCHAR(100) NOT NULL,
PRIMARY KEY ('country_id'),
UNIQUE KEY ('name')
);
CREATE TABLE States (
'state_id' INT(20) NOT NULL AUTO_INCREMENT,
'name' VARCHAR(100) NOT NULL,
'country_id' INT(20) NOT NULL,
PRIMARY KEY ('state_id'),
UNIQUE KEY ('name','country_id')
);
CREATE TABLE Cities (
'city_id' INT(20) NOT NULL AUTO_INCREMENT,
'name' VARCHAR(100) NOT NULL,
'state_id' INT(20) NOT NULL,
PRIMARY KEY ('city_id'),
UNIQUE KEY ('name','state_id')
);
Those should be populated by you, then the users just select one of the options, and that allows proper searches. In users
you only have a reference to city:
'city_id' INT(20) NOT NULL,
FOREIGN KEY ('city_id') REFERENCES Cities('city_id')
'birth_date' VARCHAR(255),
As stated in the comments, this should be of type DATE
.
'active' ENUM('yes','no') NOT NULL DEFAULT 'no',
An enum is not the most optimal data type here. It's better to use BIT
instead.
Table Posts
Generally, the layout looks good, I would not split it into images/videos, at least for now. This is due both storing basically the very same data, with only the file type differing. Only consider such a split if each type requires different columns. Other than that, I have this few tips about this table:
First of all, I would add a comment column so that users can write a bit about the photo/video they just posted (in addition to the simple caption).
'comment' TEXT NULL
'post_url' VARCHAR(255) NOT NULL,
Is this really necessary? Why store the full URL when you can deduce it from other columns? Specifically, the id, or better, the caption, can be used to build unique URLs for each post without needing to generate a new piece of data.
Table Followings
PRIMARY KEY ('user_id', 'following_id'),
UNIQUE INDEX ('following_id', 'user_id')
This is redundant. The primary key is, by definition, unique. The unique key is therefore unnecessary and can be removed.
General comments
One thing I would do is to be explicit about nullability of each column, even if the default favors your particular case. Being explicit doesn't hurts, and it better helps readability, so I would suggest always stating each column as NULL
/NOT NULL
.
You might consider the posibility of allowing users like comments too, in addition to the main posts (as the likes
table handles).
You can also consider the option of adding some history tables for registering each edition of posts (pretty much like StackOverflow does), in addition to the date_updated
column, if you may need additional audits or rollback capability.
- 496
- 4
- 15