Database schema advice : multiple tables with the shared columns in postgresql

Question 1

Let's pretend I'm making a basic twitter clone. We could imagine our database is set up as the following:

1. Table 1 - contains user info {username, password}
2. Table 2 - contains session info {username, session_id, expire}
3. Table 3 - contains post info {username, post_content, number_of_likes}

These tables are basically grouped by function. However username is shared between all of them. If a user was to change their username, this would be very hard to maintain since it would have to update across 3 tables. Is there a suggested way to organize user data here? E.g. one giant table? (bleh) or perhaps a way to centrally reference username so that changing the value once would update all dependent cells? I am using postgresql and am new to the language and databases.

Question 2

Read up on "database normalization". That's basically what you're asking about here.

Question 3

The typical solution would be to introduce a surrogate key, i.e. some ID number that's just used within the database. Then we might have a schema:

CREATE TABLE users (
 userid bigserial PRIMARY KEY, -- will autoincrement
 username text NOT NULL,
 password blob NOT NULL
);
CREATE TABLE sessions (
 session_id int NOT NULL,
 userid int REFERENCES users(userid),
 expire timestamp NOT NULL
);
CREATE TABLE posts (
 postid int PRIMARY KEY,
 userid int REFERENCES users(userid),
 content text NOT NULL,
 likes int NOT NULL DEFAULT 0
);

But you don't need it. You can still use the username as a natural key. You are concerned that

If a user was to change their username, this would be very hard to maintain since it would have to update across 3 tables.

But such an update would be fairly easy. You could explicitly create a TRANSACTION consisting of multiple queries, one updating each table.

BEGIN TRANSACTION;
 UPDATE users ...;
 UPDATE sessions ...;
 UPDATE posts ...;
COMMIT;

Or, more realistically, we could use a REFERENCES clause in the table definition to track updates. You already came up with this solution yourself:

or perhaps a way to centrally reference username so that changing the value once would update all dependent cells?

In our table definition, we'd can CASCADE such changes with a column definition like

CREATE TABLE posts (
 username text REFERENCES users(username) ON DELETE CASCADE ON UPDATE CASCADE,
 ...
);

Then, you can DELETE all of one user's data just by deleting their entry in the users table, or update their username in all tables just by updating it in the users table.

The point here is that RDBMS like Postgres are really mature systems with lots of features. These features generally help with keeping the data consistent – you can add constraints like the REFERENCES clause to ensure that no entries in the other tables can exist without a matching row in the users table, but you can also use such features to automatically update all of the dependent data.

amon amon 136k27 gold badges295 silver badges386 bronze badges · Accepted Answer · 2022-11-17 18:49:31Z

The typical solution would be to introduce a surrogate key, i.e. some ID number that's just used within the database. Then we might have a schema:

CREATE TABLE users (
 userid bigserial PRIMARY KEY, -- will autoincrement
 username text NOT NULL,
 password blob NOT NULL
);
CREATE TABLE sessions (
 session_id int NOT NULL,
 userid int REFERENCES users(userid),
 expire timestamp NOT NULL
);
CREATE TABLE posts (
 postid int PRIMARY KEY,
 userid int REFERENCES users(userid),
 content text NOT NULL,
 likes int NOT NULL DEFAULT 0
);

But you don't need it. You can still use the username as a natural key. You are concerned that

If a user was to change their username, this would be very hard to maintain since it would have to update across 3 tables.

But such an update would be fairly easy. You could explicitly create a TRANSACTION consisting of multiple queries, one updating each table.

BEGIN TRANSACTION;
 UPDATE users ...;
 UPDATE sessions ...;
 UPDATE posts ...;
COMMIT;

Or, more realistically, we could use a REFERENCES clause in the table definition to track updates. You already came up with this solution yourself:

or perhaps a way to centrally reference username so that changing the value once would update all dependent cells?

In our table definition, we'd can CASCADE such changes with a column definition like

CREATE TABLE posts (
 username text REFERENCES users(username) ON DELETE CASCADE ON UPDATE CASCADE,
 ...
);

Then, you can DELETE all of one user's data just by deleting their entry in the users table, or update their username in all tables just by updating it in the users table.

The point here is that RDBMS like Postgres are really mature systems with lots of features. These features generally help with keeping the data consistent – you can add constraints like the REFERENCES clause to ensure that no entries in the other tables can exist without a matching row in the users table, but you can also use such features to automatically update all of the dependent data.

Stack Exchange Network

Database schema advice : multiple tables with the shared columns in postgresql

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Database schema advice : multiple tables with the shared columns in postgresql

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions