Schema design for user profile management

Question 1

Is this a good design? I would like to create a view on user table to get a list of users based on filtered conditions. Please suggest ways of improving this schema design.

A view example that I would like to add on this schema:

Select all users who belong to group and all groups created after that group.
Select all users with last message, status, location and media URLs included.

CREATE TABLE IF NOT EXISTS users 
 ( 
 userid TEXT NOT NULL, 
 name TEXT NULL, 
 lmessage INTEGER NULL, 
 statusid INTEGER NULL, /* statusid should refer to last status of the user in status table*/
 locationid INTEGER NULL, /* locationid should refer to last status of the user in locations table */
 registered INTEGER NOT NULL, 
 tinypic INTEGER NULL /* this refers to media id in media table */, 
 largepic INTEGER NULL /* this also refers to media id in media table */,
 groupid INTEGER NULL /* this refers to id in groups table */ ,
 PRIMARY KEY (userid) 
 ); 
CREATE TABLE IF NOT EXISTS locations 
 ( 
 serial INTEGER,
 locationid TEXT NOT NULL, 
 userid TEXT NOT NULL,
 time INTEGER NULL,
 PRIMARY KEY (serial) 
 ); 
CREATE TABLE IF NOT EXISTS status 
 (
 serial INTEGER,
 userid TEXT NULL,
 message TEXT NOT NULL, 
 time INTEGER NULL,
 PRIMARY KEY (serial) 
 ); 
CREATE TABLE IF NOT EXISTS messages 
 ( 
 sno INTEGER, 
 messageid INTEGER NOT NULL, 
 sender TEXT NOT NULL, 
 receiver TEXT NOT NULL, 
 time INTEGER NULL, 
 message TEXT NULL, 
 image INTEGER NULL, 
 video INTEGER NULL, 
 audio INTEGER NULL, 
 PRIMARY KEY (sno) 
 ); 
CREATE TABLE IF NOT EXISTS media 
 ( 
 mediaid TEXT NOT NULL UNIQUE, 
 url TEXT NULL, 
 downloaded INTEGER NULL, 
 thumbnail TEXT NULL, 
 PRIMARY KEY (mediaid) 
 ); 
CREATE TABLE IF NOT EXISTS groups 
 (
 serial INTEGER,
 name TEXT NOT NULL,
 id INTEGER NOT NULL
 PRIMARY KEY(serial)
 );
CREATE UNIQUE INDEX IF NOT EXISTS id_unique ON users (userid ASC); 
CREATE UNIQUE INDEX IF NOT EXISTS serial_unique ON status (serial ASC); 
CREATE UNIQUE INDEX IF NOT EXISTS id_unique ON messages (sno ASC); 
CREATE UNIQUE INDEX IF NOT EXISTS serial_unique ON patterns (serial DESC); 
CREATE UNIQUE INDEX IF NOT EXISTS mediaid_unique ON media (mediaid ASC);

Question 2

Is this a good design?

No, not really. Some of the tables are not well designed. Take a look at the responsibilities of the users table and the media table.

The media table has one clear responsibility: store the attributes of media items. Every record is self-contained, complete. You will probably never update the records in this table: you will only insert or delete. That's nice and simple
The users table has two responsibilities:
1. Store the attributes of users
2. Store the relationship of users and other objects
Every record is only half-complete (user attributes), to get the full picture you need to follow the references to other tables. You will probably never update the records very frequently, for setting the last status and location ids

It would be better to make the users table in charge of user attributes only. You could add additional tables to track the relations with other objects.

The circular dependencies are especially ugly:

users depends on status for the last status, and status depends on users to associate all statuses with a user
Same for locations

I recommend this kind of structure:

user - in charge of user attributes (name, registered date)
status - in charge of status attributes (message, date)
user_status - in charge of linking user and status
user_last_status - in charge of linking user and last status, with unique key on user

And similarly for locations. It's more work, but the end result will be more robust and flexible.

A reasonable compromise can be to not create user_status, but include user_id in the status table (with a similar treatment for locations too).

Select all users who belong to group and all groups created after that group.

To find users by a given group id fast, you need an index on the group id column in the users table. Below I explain how to create foreign keys. When you make user.group_id a foreign key referencing group.group_id, creating such index is one of the steps, and will ensure fast lookups.

In addition, you might also find a group fast by name. In that case you might want to create an index on group.name column. However, if you won't have many groups than that would be overkill, and not recommended by the documentation.

Select all users with last message, status, location and media URLs included.

Generally speaking, For good performance of queries like this, you want to make sure that all fields used in the WHERE and JOIN clauses are indexed. If you follow my suggestions, this will be automatically the case for most of these elements you mention, and for the rest you can probably figure out yourself.

Another issue with the design is that some of the fields are not clear enough, for example:

users
- userid: it would be good to know if this will be a technical id or a something like a UNIX username. If the latter, I'd rename to username
- name: NULL is allowed, but why?
- lmessage: an integer field, but not clear which other table it will reference
- registered: I suppose the date when the user was created. I recommend using the type datetime, even if internally it might be just an integer
locations
- serial: technical id I suppose?
- locationid: text? That's a bit surprising. How is it different from serial?
- time: perhaps there can be a better name for this. And again, I recommend using the type datetime
status: similar objections as with locations
messages
- sno: technical id I suppose? I recommend to standardize the name of technical ids in all tables
- sender, receiver: I'm wondering if these have any relation to other tables
media: this is good and clear. The UNIQUE keyword on mediaid is pointless when you add a PRIMARY KEY constraint on the same column

The answers to the questions I raised would be good in comments. Some example values for the columns would be useful too, especially for the ones where the meaning and motivation is not obvious.

Declare `PRIMARY KEY` on the field's line

Instead of declaring PRIMARY KEY at the end, it's more compact to declare it on the field itself, like this:

CREATE TABLE IF NOT EXISTS locations 
 ( 
 serial INTEGER NOT NULL PRIMARY KEY,
 -- ... 
 );

I also added NOT NULL for the field, though it's not strictly necessary. When a field is primary key in sqlite, it's treated as auto-incrementing, so even if you try to explicitly insert NULL into it, it will actually use max(col) + 1 instead. But I like to write it this way anyway to make it explicit: looking at this schema there will be no doubt that the field will never contain NULL.

Pointless unique indexes

Many of the indexes you're adding seem completely pointless:

CREATE UNIQUE INDEX IF NOT EXISTS id_unique ON users (userid ASC); 
CREATE UNIQUE INDEX IF NOT EXISTS serial_unique ON status (serial ASC); 
CREATE UNIQUE INDEX IF NOT EXISTS id_unique ON messages (sno ASC); 
CREATE UNIQUE INDEX IF NOT EXISTS mediaid_unique ON media (mediaid ASC);

In all these commands, the target fields are already marked as PRIMARY KEY, so adding a unique index will do absolutely nothing.

From the docs:

In most cases, UNIQUE and PRIMARY KEY constraints are implemented by creating a unique index in the database. (The exceptions are INTEGER PRIMARY KEY and PRIMARY KEYs on WITHOUT ROWID tables.) Hence, the following schemas are logically equivalent:
CREATE TABLE t1(a, b UNIQUE);
CREATE TABLE t1(a, b PRIMARY KEY);
CREATE TABLE t1(a, b);
CREATE UNIQUE INDEX t1b ON t1(b);

There is one more unique index in your post:

CREATE UNIQUE INDEX IF NOT EXISTS serial_unique ON patterns (serial DESC);

But the schema of the patterns table is not in your post. Curiously, this unique index uses the DESC keyword, which would set it apart from the other examples which all used the default ASC. In this case, I'm wondering what difference this will make on a table where the target column is already a primary key. That might be an interesting question on stackoverflow.com

Although in this case it seems the unique indexes are redundant and can be removed, if ever you need to make a column unique, I recommend declaring UNIQUE on the column's line when creating the table, like this:

CREATE TABLE IF NOT EXISTS status 
 (
 serial INTEGER NOT NULL PRIMARY KEY,
 userid TEXT NULL UNIQUE,
 -- ...
 );

Or else, if you really want to use a unique index with DESC sorting order, then create the index right after creating the table. It's not a good idea to separate the declarations of tables and indexes like in your post. The indexes are closely tied to their tables, and the code is most readable when dependent elements are closed to each other.

Also keep in mind that only versions of SQLite 3.3.0 understand indexes with descending sorting order.

Use foreign keys

It seems you wanted to use foreign keys just didn't know how :) Here's an example:

CREATE TABLE IF NOT EXISTS users 
 ( 
 userid TEXT NOT NULL PRIMARY KEY, 
 locationid INTEGER NULL REFERENCES locations (locationid)
 ); 
CREATE TABLE IF NOT EXISTS locations 
 ( 
 serial INTEGER NOT NULL PRIMARY KEY,
 locationid TEXT NOT NULL UNIQUE,
 ); 
CREATE INDEX users_location_index ON users(locationid);

Notice the UNIQUE constraint on the referenced column: this is a requirement of foreign keys. (For more details, see the docs.)

Lastly, notice the index on users.locationid. Although not required, this is recommended by the docs.

Indicate the column sizes

Although SQLite understands and ignores VARCHAR(N), treating it as TEXT, I recommend to use VARCHAR(N) anyway for non-text fields. This is to indicate your design intentions. It will also make it easier to port your schema to other databases in the future. To be safe, add a comment to remind that SQLite doesn't enforce the specified sizes.

Naming and style

I recommend using id column names like user_id, status_id instead of userid and statusid because this seems a common practice.

I would borrow some ideas from Django:

Use singular instead of plural for table names: for example user instead of users
Use id as the auto-incrementing technical id field consistently in all tables
Very cosmetic thing but anyway: write SQL types like INTEGER and TEXT using lowercase letters

Question 3

Instead of these comments, you should be declaring foreign keys.

 statusid INTEGER NULL, /* statusid should refer to last status of the user in status table*/
 locationid INTEGER NULL, /* locationid should refer to last status of the user in locations table */

I don't know if sqlite supports sized data types, but if it does, you should determine what the right size is and declare it. I'm sure declaring a field as Text dimensions it to a size that's bigger than you really need.

The formatting looks good. It's very readable and consistent. I see a lot of SQL that's difficult to read, so good job there.

Question 4

Sqlite does support only INTEGER, TEXT and REAL datatypes. Could you please add some query how to use foreign keys in this case and writing views on it.

Question 5

Ahh well... That would probably be why you didn't then, huh? =) There are examples in the documentation I linked to. I'm a SQL Server guy, so I'd rather not give an erroneous example.

Question 6

It would be good to use consistent naming for technical columns, to make them easily recognizable throughout your project. Especially for your primary keys userid, sno, serial, to avoid confusion. You could even give them all the name id to make it easier to remember and use.

Question 7

This statement is not necessarily true, especially when developing on something like mySQL workbench. While I personally also like naming the primary key the same thing consistently, your reasoning behind that... smells.

Question 8

@Vogel612 Django, the popular web framework does that: it adds an auto-incrementing technical column named id to all tables. Although this answer is a bit too short, if reworded better, it could make a good point: "It would be good to use consistent naming for technical columns, to make them easily recognizable throughout your project. It seems that your columns userid, sno, serial would be good candidates to apply this logic."

Question 9

@janos that doesn't make the current answer better, though.. Always vote on what is. Not on what could be or was...

Question 10

@Vogel612 I suppose you meant, in its original form it wouldn't deserve an upvote, just based on its potential. I totally agree. And I actually haven't voted on this yet. Now that Simon fixed it, I could.

janos janos 113k15 gold badges154 silver badges396 bronze badges · Accepted Answer · 2014-11-09 16:35:12Z

Is this a good design?

No, not really. Some of the tables are not well designed. Take a look at the responsibilities of the users table and the media table.

The media table has one clear responsibility: store the attributes of media items. Every record is self-contained, complete. You will probably never update the records in this table: you will only insert or delete. That's nice and simple
The users table has two responsibilities:
1. Store the attributes of users
2. Store the relationship of users and other objects
Every record is only half-complete (user attributes), to get the full picture you need to follow the references to other tables. You will probably never update the records very frequently, for setting the last status and location ids

It would be better to make the users table in charge of user attributes only. You could add additional tables to track the relations with other objects.

The circular dependencies are especially ugly:

users depends on status for the last status, and status depends on users to associate all statuses with a user
Same for locations

I recommend this kind of structure:

user - in charge of user attributes (name, registered date)
status - in charge of status attributes (message, date)
user_status - in charge of linking user and status
user_last_status - in charge of linking user and last status, with unique key on user

And similarly for locations. It's more work, but the end result will be more robust and flexible.

A reasonable compromise can be to not create user_status, but include user_id in the status table (with a similar treatment for locations too).

Select all users who belong to group and all groups created after that group.

To find users by a given group id fast, you need an index on the group id column in the users table. Below I explain how to create foreign keys. When you make user.group_id a foreign key referencing group.group_id, creating such index is one of the steps, and will ensure fast lookups.

In addition, you might also find a group fast by name. In that case you might want to create an index on group.name column. However, if you won't have many groups than that would be overkill, and not recommended by the documentation.

Select all users with last message, status, location and media URLs included.

Generally speaking, For good performance of queries like this, you want to make sure that all fields used in the WHERE and JOIN clauses are indexed. If you follow my suggestions, this will be automatically the case for most of these elements you mention, and for the rest you can probably figure out yourself.

Another issue with the design is that some of the fields are not clear enough, for example:

users
- userid: it would be good to know if this will be a technical id or a something like a UNIX username. If the latter, I'd rename to username
- name: NULL is allowed, but why?
- lmessage: an integer field, but not clear which other table it will reference
- registered: I suppose the date when the user was created. I recommend using the type datetime, even if internally it might be just an integer
locations
- serial: technical id I suppose?
- locationid: text? That's a bit surprising. How is it different from serial?
- time: perhaps there can be a better name for this. And again, I recommend using the type datetime
status: similar objections as with locations
messages
- sno: technical id I suppose? I recommend to standardize the name of technical ids in all tables
- sender, receiver: I'm wondering if these have any relation to other tables
media: this is good and clear. The UNIQUE keyword on mediaid is pointless when you add a PRIMARY KEY constraint on the same column

The answers to the questions I raised would be good in comments. Some example values for the columns would be useful too, especially for the ones where the meaning and motivation is not obvious.

Declare `PRIMARY KEY` on the field's line

Instead of declaring PRIMARY KEY at the end, it's more compact to declare it on the field itself, like this:

CREATE TABLE IF NOT EXISTS locations 
 ( 
 serial INTEGER NOT NULL PRIMARY KEY,
 -- ... 
 );

I also added NOT NULL for the field, though it's not strictly necessary. When a field is primary key in sqlite, it's treated as auto-incrementing, so even if you try to explicitly insert NULL into it, it will actually use max(col) + 1 instead. But I like to write it this way anyway to make it explicit: looking at this schema there will be no doubt that the field will never contain NULL.

Pointless unique indexes

Many of the indexes you're adding seem completely pointless:

CREATE UNIQUE INDEX IF NOT EXISTS id_unique ON users (userid ASC); 
CREATE UNIQUE INDEX IF NOT EXISTS serial_unique ON status (serial ASC); 
CREATE UNIQUE INDEX IF NOT EXISTS id_unique ON messages (sno ASC); 
CREATE UNIQUE INDEX IF NOT EXISTS mediaid_unique ON media (mediaid ASC);

In all these commands, the target fields are already marked as PRIMARY KEY, so adding a unique index will do absolutely nothing.

From the docs:

In most cases, UNIQUE and PRIMARY KEY constraints are implemented by creating a unique index in the database. (The exceptions are INTEGER PRIMARY KEY and PRIMARY KEYs on WITHOUT ROWID tables.) Hence, the following schemas are logically equivalent:
CREATE TABLE t1(a, b UNIQUE);
CREATE TABLE t1(a, b PRIMARY KEY);
CREATE TABLE t1(a, b);
CREATE UNIQUE INDEX t1b ON t1(b);

There is one more unique index in your post:

CREATE UNIQUE INDEX IF NOT EXISTS serial_unique ON patterns (serial DESC);

But the schema of the patterns table is not in your post. Curiously, this unique index uses the DESC keyword, which would set it apart from the other examples which all used the default ASC. In this case, I'm wondering what difference this will make on a table where the target column is already a primary key. That might be an interesting question on stackoverflow.com

Although in this case it seems the unique indexes are redundant and can be removed, if ever you need to make a column unique, I recommend declaring UNIQUE on the column's line when creating the table, like this:

CREATE TABLE IF NOT EXISTS status 
 (
 serial INTEGER NOT NULL PRIMARY KEY,
 userid TEXT NULL UNIQUE,
 -- ...
 );

Or else, if you really want to use a unique index with DESC sorting order, then create the index right after creating the table. It's not a good idea to separate the declarations of tables and indexes like in your post. The indexes are closely tied to their tables, and the code is most readable when dependent elements are closed to each other.

Also keep in mind that only versions of SQLite 3.3.0 understand indexes with descending sorting order.

Use foreign keys

It seems you wanted to use foreign keys just didn't know how :) Here's an example:

CREATE TABLE IF NOT EXISTS users 
 ( 
 userid TEXT NOT NULL PRIMARY KEY, 
 locationid INTEGER NULL REFERENCES locations (locationid)
 ); 
CREATE TABLE IF NOT EXISTS locations 
 ( 
 serial INTEGER NOT NULL PRIMARY KEY,
 locationid TEXT NOT NULL UNIQUE,
 ); 
CREATE INDEX users_location_index ON users(locationid);

Notice the UNIQUE constraint on the referenced column: this is a requirement of foreign keys. (For more details, see the docs.)

Lastly, notice the index on users.locationid. Although not required, this is recommended by the docs.

Indicate the column sizes

Although SQLite understands and ignores VARCHAR(N), treating it as TEXT, I recommend to use VARCHAR(N) anyway for non-text fields. This is to indicate your design intentions. It will also make it easier to port your schema to other databases in the future. To be safe, add a comment to remind that SQLite doesn't enforce the specified sizes.

Naming and style

I recommend using id column names like user_id, status_id instead of userid and statusid because this seems a common practice.

I would borrow some ideas from Django:

Use singular instead of plural for table names: for example user instead of users
Use id as the auto-incrementing technical id field consistently in all tables
Very cosmetic thing but anyway: write SQL types like INTEGER and TEXT using lowercase letters

Stack Exchange Network

Schema design for user profile management

3 Answers 3

Declare `PRIMARY KEY` on the field's line

Pointless unique indexes

Use foreign keys

Indicate the column sizes

Naming and style

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Schema design for user profile management

3 Answers 3

Declare PRIMARY KEY on the field's line

Pointless unique indexes

Use foreign keys

Indicate the column sizes

Naming and style

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Declare `PRIMARY KEY` on the field's line