Our company is trying to find a good generic way to have Many-to-One data for an entity. For example, a user might have 1 primary email, but many other emails also attached to their account.
So we have a users table (1 row maps to 1 user):
| id | handle | primary_email | is_verified | first_name | last_name |
|--------|----------|---------------|-------------|------------|-----------|
| (int) | (string) | (string) | (boolean) | (string) | (string) |
but then we may want to store multiple emails for the same user, so we have another table, let's called it "users_map", where many rows map to 1 user:
| id | user_id | key | value |
|--------|---------|----------|--------|
| (int) | (uuid) | (string) | (json) |
so for example if there were multiple emails for the same user, we would do something like this:
| id | user_id | key | value |
|----|---------|-------|------------------|
| 1 | 1 | email | "[email protected]" |
| 2 | 1 | email | "[email protected]" |
| 3 | 1 | email | "[email protected]" |
| 4 | 2 | email | "[email protected]" |
| 5 | 2 | email | "[email protected]" |
so my question is - is there a better way to do this other than using JSON for the value column? If not - is there a way to enforce a schema on the JSON somehow? Last question - from my brief research the inverse table design is called an "unpivot" table - but if there is a better name for it please let me know.
The potential advantage of a generic table by user? if you shard by user, each shard has only 2 tables instead of 5 or 10?
2 Answers 2
Keep the email addresses in a separate address
table.
CREATE TABLE schemaname.address
(
address_id bigserial not null primary key,
user_id bigint not null,
address_type varchar(30) not null,
address varchar(250) not null,
is_primary boolean not null,
/*
optionally you could have valid_to/valid_from timestamp fields here to track address history
*/
INDEX UIX_only_one_primary_address_per_user_per_address_type UNIQUE (user_id, address_type, is_primary) WHERE is_primary = 1
)
The downside is that users can exist with email addresses but without a primary email - this can be enforced with an insert/update trigger on the table.
Then:
CREATE VIEW schemaname.v_user_primary_email AS
SELECT user_id, address AS email_address
FROM schemaname.address
WHERE address_type = 'EMAIL' AND is_primary = TRUE
CREATE VIEW schemaname.v_user_email AS
SELECT user_id, address AS email_address, is_priority
FROM schemaname.address
WHERE address_type = 'EMAIL'
-
So that means every many-to-one piece of data has it's own table. In the OP I am flirting with the idea of one table that can handle all of the many-to-one data points. Humor me a bit idk :)Alexander Mills– Alexander Mills02/22/2020 00:32:34Commented Feb 22, 2020 at 0:32
It should be noted that there is no such thing as "flexible data structure". Your code is written against some factual data structure. If you used to use only one email per user, and then, in any way, added a way to use several -- all your previous code became broken, and should be updated.
There can be, however, a significantly higher cost to update database structure than to upload next version of your code. For this reason, in databases often are "extension points" which are to be used in between bigger schema updates.
How the extension points are implemented is really only limited by your imagination. Sometimes there are reserved fields in tables. Sometimes there are table user_data which contains arbitrary string (which may be interpreted as json, or you just assume in code that key "email" means the value is email string, and key, for example, "age" means stringified number). Or you don't even have to have "user_map", it can be just some generic "data" (id, string key, int parent_id, string value2) and you put there (1, "user_email", 1, "[email protected]").
I don't think is makes sense to have any validation on the data on the database service, because this is the point of the table that you don't know what are you going to put there. You will have to make the validation in the code. Just some indices to optimize search, not even necessarily unique ones.
Explore related questions
See similar questions with these tags.
user_emails
table instead of auser_map
table and there would be nokey
column, just avalue
column that would probably be titled "email"