How to use strict schema with seemingly fluid data type

Question 1

Our company is trying to find a good generic way to have Many-to-One data for an entity. For example, a user might have 1 primary email, but many other emails also attached to their account.

So we have a users table (1 row maps to 1 user):

| id | handle | primary_email | is_verified | first_name | last_name |
|--------|----------|---------------|-------------|------------|-----------|
| (int) | (string) | (string) | (boolean) | (string) | (string) |

but then we may want to store multiple emails for the same user, so we have another table, let's called it "users_map", where many rows map to 1 user:

| id | user_id | key | value |
|--------|---------|----------|--------|
| (int) | (uuid) | (string) | (json) |

so for example if there were multiple emails for the same user, we would do something like this:

| id | user_id | key | value |
|----|---------|-------|------------------|
| 1 | 1 | email | "[email protected]" |
| 2 | 1 | email | "[email protected]" |
| 3 | 1 | email | "[email protected]" |
| 4 | 2 | email | "[email protected]" |
| 5 | 2 | email | "[email protected]" |

so my question is - is there a better way to do this other than using JSON for the value column? If not - is there a way to enforce a schema on the JSON somehow? Last question - from my brief research the inverse table design is called an "unpivot" table - but if there is a better name for it please let me know.

The potential advantage of a generic table by user? if you shard by user, each shard has only 2 tables instead of 5 or 10?

Question 2

note that the "non-generic" way to do this would be to have a user_emails table instead of a user_map table and there would be no key column, just a value column that would probably be titled "email"

Question 3

You could have a table for each type, like "user_data_string", "user_data_int"

Question 4

Keep the email addresses in a separate address table.

CREATE TABLE schemaname.address 
(
 address_id bigserial not null primary key,
 user_id bigint not null,
 address_type varchar(30) not null,
 address varchar(250) not null,
 is_primary boolean not null,
 /*
 optionally you could have valid_to/valid_from timestamp fields here to track address history
 */
 INDEX UIX_only_one_primary_address_per_user_per_address_type UNIQUE (user_id, address_type, is_primary) WHERE is_primary = 1
)

The downside is that users can exist with email addresses but without a primary email - this can be enforced with an insert/update trigger on the table.

Then:

CREATE VIEW schemaname.v_user_primary_email AS
SELECT user_id, address AS email_address 
 FROM schemaname.address 
 WHERE address_type = 'EMAIL' AND is_primary = TRUE

CREATE VIEW schemaname.v_user_email AS
SELECT user_id, address AS email_address, is_priority 
 FROM schemaname.address 
 WHERE address_type = 'EMAIL'

Question 5

So that means every many-to-one piece of data has it's own table. In the OP I am flirting with the idea of one table that can handle all of the many-to-one data points. Humor me a bit idk :)

Question 6

It should be noted that there is no such thing as "flexible data structure". Your code is written against some factual data structure. If you used to use only one email per user, and then, in any way, added a way to use several -- all your previous code became broken, and should be updated.

There can be, however, a significantly higher cost to update database structure than to upload next version of your code. For this reason, in databases often are "extension points" which are to be used in between bigger schema updates.

How the extension points are implemented is really only limited by your imagination. Sometimes there are reserved fields in tables. Sometimes there are table user_data which contains arbitrary string (which may be interpreted as json, or you just assume in code that key "email" means the value is email string, and key, for example, "age" means stringified number). Or you don't even have to have "user_map", it can be just some generic "data" (id, string key, int parent_id, string value2) and you put there (1, "user_email", 1, "[email protected]").

I don't think is makes sense to have any validation on the data on the database service, because this is the point of the table that you don't know what are you going to put there. You will have to make the validation in the code. Just some indices to optimize search, not even necessarily unique ones.

matiasf matiasf 1111 bronze badge · Answer 1 · 2020-02-21 23:42:23Z

Keep the email addresses in a separate address table.

CREATE TABLE schemaname.address 
(
 address_id bigserial not null primary key,
 user_id bigint not null,
 address_type varchar(30) not null,
 address varchar(250) not null,
 is_primary boolean not null,
 /*
 optionally you could have valid_to/valid_from timestamp fields here to track address history
 */
 INDEX UIX_only_one_primary_address_per_user_per_address_type UNIQUE (user_id, address_type, is_primary) WHERE is_primary = 1
)

The downside is that users can exist with email addresses but without a primary email - this can be enforced with an insert/update trigger on the table.

Then:

CREATE VIEW schemaname.v_user_primary_email AS
SELECT user_id, address AS email_address 
 FROM schemaname.address 
 WHERE address_type = 'EMAIL' AND is_primary = TRUE

CREATE VIEW schemaname.v_user_email AS
SELECT user_id, address AS email_address, is_priority 
 FROM schemaname.address 
 WHERE address_type = 'EMAIL'

So that means every many-to-one piece of data has it's own table. In the OP I am flirting with the idea of one table that can handle all of the many-to-one data points. Humor me a bit idk :)

max630 max630 2,6051 gold badge13 silver badges16 bronze badges · Answer 2 · 2020-02-22 03:08:24Z

It should be noted that there is no such thing as "flexible data structure". Your code is written against some factual data structure. If you used to use only one email per user, and then, in any way, added a way to use several -- all your previous code became broken, and should be updated.

There can be, however, a significantly higher cost to update database structure than to upload next version of your code. For this reason, in databases often are "extension points" which are to be used in between bigger schema updates.

How the extension points are implemented is really only limited by your imagination. Sometimes there are reserved fields in tables. Sometimes there are table user_data which contains arbitrary string (which may be interpreted as json, or you just assume in code that key "email" means the value is email string, and key, for example, "age" means stringified number). Or you don't even have to have "user_map", it can be just some generic "data" (id, string key, int parent_id, string value2) and you put there (1, "user_email", 1, "[email protected]").

I don't think is makes sense to have any validation on the data on the database service, because this is the point of the table that you don't know what are you going to put there. You will have to make the validation in the code. Just some indices to optimize search, not even necessarily unique ones.

Stack Exchange Network

How to use strict schema with seemingly fluid data type

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to use strict schema with seemingly fluid data type

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions