Is this a good design? I would like to create a view on user table to get a list of users based on filtered conditions. Please suggest ways of improving this schema design.
A view example that I would like to add on this schema:
- Select all users who belong to group and all groups created after that group.
- Select all users with last message, status, location and media URLs included.
CREATE TABLE IF NOT EXISTS users
(
userid TEXT NOT NULL,
name TEXT NULL,
lmessage INTEGER NULL,
statusid INTEGER NULL, /* statusid should refer to last status of the user in status table*/
locationid INTEGER NULL, /* locationid should refer to last status of the user in locations table */
registered INTEGER NOT NULL,
tinypic INTEGER NULL /* this refers to media id in media table */,
largepic INTEGER NULL /* this also refers to media id in media table */,
groupid INTEGER NULL /* this refers to id in groups table */ ,
PRIMARY KEY (userid)
);
CREATE TABLE IF NOT EXISTS locations
(
serial INTEGER,
locationid TEXT NOT NULL,
userid TEXT NOT NULL,
time INTEGER NULL,
PRIMARY KEY (serial)
);
CREATE TABLE IF NOT EXISTS status
(
serial INTEGER,
userid TEXT NULL,
message TEXT NOT NULL,
time INTEGER NULL,
PRIMARY KEY (serial)
);
CREATE TABLE IF NOT EXISTS messages
(
sno INTEGER,
messageid INTEGER NOT NULL,
sender TEXT NOT NULL,
receiver TEXT NOT NULL,
time INTEGER NULL,
message TEXT NULL,
image INTEGER NULL,
video INTEGER NULL,
audio INTEGER NULL,
PRIMARY KEY (sno)
);
CREATE TABLE IF NOT EXISTS media
(
mediaid TEXT NOT NULL UNIQUE,
url TEXT NULL,
downloaded INTEGER NULL,
thumbnail TEXT NULL,
PRIMARY KEY (mediaid)
);
CREATE TABLE IF NOT EXISTS groups
(
serial INTEGER,
name TEXT NOT NULL,
id INTEGER NOT NULL
PRIMARY KEY(serial)
);
CREATE UNIQUE INDEX IF NOT EXISTS id_unique ON users (userid ASC);
CREATE UNIQUE INDEX IF NOT EXISTS serial_unique ON status (serial ASC);
CREATE UNIQUE INDEX IF NOT EXISTS id_unique ON messages (sno ASC);
CREATE UNIQUE INDEX IF NOT EXISTS serial_unique ON patterns (serial DESC);
CREATE UNIQUE INDEX IF NOT EXISTS mediaid_unique ON media (mediaid ASC);
3 Answers 3
Is this a good design?
No, not really.
Some of the tables are not well designed.
Take a look at the responsibilities of the users
table and the media
table.
The
media
table has one clear responsibility: store the attributes of media items. Every record is self-contained, complete. You will probably never update the records in this table: you will only insert or delete. That's nice and simpleThe
users
table has two responsibilities:- Store the attributes of users
- Store the relationship of users and other objects
Every record is only half-complete (user attributes), to get the full picture you need to follow the references to other tables. You will probably never update the records very frequently, for setting the last status and location ids
It would be better to make the users
table in charge of user attributes only.
You could add additional tables to track the relations with other objects.
The circular dependencies are especially ugly:
users
depends onstatus
for the last status, andstatus
depends onusers
to associate all statuses with a user- Same for
locations
I recommend this kind of structure:
user
- in charge of user attributes (name, registered date)status
- in charge of status attributes (message, date)user_status
- in charge of linking user and statususer_last_status
- in charge of linking user and last status, with unique key on user
And similarly for locations. It's more work, but the end result will be more robust and flexible.
A reasonable compromise can be to not create user_status
,
but include user_id
in the status
table (with a similar treatment for locations too).
Select all users who belong to group and all groups created after that group.
To find users by a given group id fast,
you need an index on the group id column in the users table.
Below I explain how to create foreign keys.
When you make user.group_id
a foreign key referencing group.group_id
,
creating such index is one of the steps,
and will ensure fast lookups.
In addition,
you might also find a group fast by name.
In that case you might want to create an index on group.name
column.
However, if you won't have many groups than that would be overkill,
and not recommended by the documentation.
Select all users with last message, status, location and media URLs included.
Generally speaking, For good performance of queries like this,
you want to make sure that all fields used in the WHERE
and JOIN
clauses are indexed.
If you follow my suggestions,
this will be automatically the case for most of these elements you mention,
and for the rest you can probably figure out yourself.
Another issue with the design is that some of the fields are not clear enough, for example:
users
userid
: it would be good to know if this will be a technical id or a something like a UNIX username. If the latter, I'd rename tousername
name
:NULL
is allowed, but why?lmessage
: an integer field, but not clear which other table it will referenceregistered
: I suppose the date when the user was created. I recommend using the typedatetime
, even if internally it might be just an integer
locations
serial
: technical id I suppose?locationid
: text? That's a bit surprising. How is it different fromserial
?time
: perhaps there can be a better name for this. And again, I recommend using the typedatetime
status
: similar objections as withlocations
messages
sno
: technical id I suppose? I recommend to standardize the name of technical ids in all tablessender
,receiver
: I'm wondering if these have any relation to other tables
media
: this is good and clear. TheUNIQUE
keyword onmediaid
is pointless when you add aPRIMARY KEY
constraint on the same column
The answers to the questions I raised would be good in comments. Some example values for the columns would be useful too, especially for the ones where the meaning and motivation is not obvious.
Declare PRIMARY KEY
on the field's line
Instead of declaring PRIMARY KEY
at the end,
it's more compact to declare it on the field itself, like this:
CREATE TABLE IF NOT EXISTS locations
(
serial INTEGER NOT NULL PRIMARY KEY,
-- ...
);
I also added NOT NULL
for the field, though it's not strictly necessary.
When a field is primary key in sqlite,
it's treated as auto-incrementing,
so even if you try to explicitly insert NULL into it,
it will actually use max(col) + 1
instead.
But I like to write it this way anyway to make it explicit:
looking at this schema there will be no doubt that the field will never contain NULL.
Pointless unique indexes
Many of the indexes you're adding seem completely pointless:
CREATE UNIQUE INDEX IF NOT EXISTS id_unique ON users (userid ASC); CREATE UNIQUE INDEX IF NOT EXISTS serial_unique ON status (serial ASC); CREATE UNIQUE INDEX IF NOT EXISTS id_unique ON messages (sno ASC); CREATE UNIQUE INDEX IF NOT EXISTS mediaid_unique ON media (mediaid ASC);
In all these commands, the target fields are already marked as PRIMARY KEY
,
so adding a unique index will do absolutely nothing.
From the docs:
In most cases, UNIQUE and PRIMARY KEY constraints are implemented by creating a unique index in the database. (The exceptions are INTEGER PRIMARY KEY and PRIMARY KEYs on WITHOUT ROWID tables.) Hence, the following schemas are logically equivalent:
CREATE TABLE t1(a, b UNIQUE); CREATE TABLE t1(a, b PRIMARY KEY); CREATE TABLE t1(a, b); CREATE UNIQUE INDEX t1b ON t1(b);
There is one more unique index in your post:
CREATE UNIQUE INDEX IF NOT EXISTS serial_unique ON patterns (serial DESC);
But the schema of the patterns
table is not in your post.
Curiously, this unique index uses the DESC
keyword,
which would set it apart from the other examples which all used the default ASC
.
In this case, I'm wondering what difference this will make on a table where the target column is already a primary key.
That might be an interesting question on stackoverflow.com
Although in this case it seems the unique indexes are redundant and can be removed,
if ever you need to make a column unique,
I recommend declaring UNIQUE
on the column's line when creating the table,
like this:
CREATE TABLE IF NOT EXISTS status
(
serial INTEGER NOT NULL PRIMARY KEY,
userid TEXT NULL UNIQUE,
-- ...
);
Or else,
if you really want to use a unique index with DESC
sorting order,
then create the index right after creating the table.
It's not a good idea to separate the declarations of tables and indexes like in your post. The indexes are closely tied to their tables,
and the code is most readable when dependent elements are closed to each other.
Also keep in mind that only versions of SQLite 3.3.0 understand indexes with descending sorting order.
Use foreign keys
It seems you wanted to use foreign keys just didn't know how :) Here's an example:
CREATE TABLE IF NOT EXISTS users
(
userid TEXT NOT NULL PRIMARY KEY,
locationid INTEGER NULL REFERENCES locations (locationid)
);
CREATE TABLE IF NOT EXISTS locations
(
serial INTEGER NOT NULL PRIMARY KEY,
locationid TEXT NOT NULL UNIQUE,
);
CREATE INDEX users_location_index ON users(locationid);
Notice the UNIQUE
constraint on the referenced column:
this is a requirement of foreign keys.
(For more details, see the docs.)
Lastly, notice the index on users.locationid
.
Although not required, this is recommended by the docs.
Indicate the column sizes
Although SQLite understands and ignores VARCHAR(N)
,
treating it as TEXT
,
I recommend to use VARCHAR(N)
anyway for non-text fields.
This is to indicate your design intentions.
It will also make it easier to port your schema to other databases in the future.
To be safe, add a comment to remind that SQLite doesn't enforce the specified sizes.
Naming and style
I recommend using id column names like user_id
, status_id
instead of userid
and statusid
because this seems a common practice.
I would borrow some ideas from Django:
- Use singular instead of plural for table names: for example
user
instead ofusers
- Use
id
as the auto-incrementing technical id field consistently in all tables - Very cosmetic thing but anyway: write SQL types like
INTEGER
andTEXT
using lowercase letters
Instead of these comments, you should be declaring foreign keys.
statusid INTEGER NULL, /* statusid should refer to last status of the user in status table*/ locationid INTEGER NULL, /* locationid should refer to last status of the user in locations table */
I don't know if sqlite supports sized data types, but if it does, you should determine what the right size is and declare it. I'm sure declaring a field as Text
dimensions it to a size that's bigger than you really need.
The formatting looks good. It's very readable and consistent. I see a lot of SQL that's difficult to read, so good job there.
-
\$\begingroup\$ Sqlite does support only INTEGER, TEXT and REAL datatypes. Could you please add some query how to use foreign keys in this case and writing views on it. \$\endgroup\$SoI– SoI2014年11月03日 08:34:56 +00:00Commented Nov 3, 2014 at 8:34
-
\$\begingroup\$ Ahh well... That would probably be why you didn't then, huh? =) There are examples in the documentation I linked to. I'm a SQL Server guy, so I'd rather not give an erroneous example. \$\endgroup\$RubberDuck– RubberDuck2014年11月03日 10:27:30 +00:00Commented Nov 3, 2014 at 10:27
It would be good to use consistent naming for technical columns, to make them easily recognizable throughout your project. Especially for your primary keys userid
, sno
, serial
, to avoid confusion. You could even give them all the name id
to make it easier to remember and use.
-
\$\begingroup\$ This statement is not necessarily true, especially when developing on something like mySQL workbench. While I personally also like naming the primary key the same thing consistently, your reasoning behind that... smells. \$\endgroup\$Vogel612– Vogel6122014年11月09日 18:00:07 +00:00Commented Nov 9, 2014 at 18:00
-
2\$\begingroup\$ @Vogel612 Django, the popular web framework does that: it adds an auto-incrementing technical column named
id
to all tables. Although this answer is a bit too short, if reworded better, it could make a good point: "It would be good to use consistent naming for technical columns, to make them easily recognizable throughout your project. It seems that your columnsuserid
,sno
,serial
would be good candidates to apply this logic." \$\endgroup\$janos– janos2014年11月09日 18:18:20 +00:00Commented Nov 9, 2014 at 18:18 -
1\$\begingroup\$ @janos that doesn't make the current answer better, though.. Always vote on what is. Not on what could be or was... \$\endgroup\$Vogel612– Vogel6122014年11月11日 12:13:24 +00:00Commented Nov 11, 2014 at 12:13
-
\$\begingroup\$ @Vogel612 I suppose you meant, in its original form it wouldn't deserve an upvote, just based on its potential. I totally agree. And I actually haven't voted on this yet. Now that Simon fixed it, I could. \$\endgroup\$janos– janos2014年11月11日 12:18:51 +00:00Commented Nov 11, 2014 at 12:18