What is an efficient design to store variables for different product lines in ER database?

Question 1

This question is about refactoring existing database design.

My data flow is

User generates some data for product lines A, B, C
Data is saved into the database once
Data is later retrieved multiple times

Current design has 3 tables: data_a, data_b, data_c, where each table shares some columns that are identical (in name) and some that are unique to that product line.

For example, same-name columns in each table are weight, unit_system and a few others. The differently-named columns have values that represent physical quantities of the particular product line. Those are named using various alphanumeric identifiers, like a, b5, e2, and there is a different set of them for different product line. Those sets can share elements, i.e. b5 can be in more than one table, but then something like t1 can be in one table but not the others.

Problem

Currently when there is a need to add some value say x9 to product line a, I would update the database schema for data_a to have column x9. I make the values of x9 as 0 for existing column rows, and new records will begin to populate with the actual x9 values. Then I update the code in relevant places to insert x9 into the table or retrieve it from the table.

Existing design

data_a(id, item_id, shared, different_a)
data_b(id, item_id, shared, different_b)
data_c(id, item_id, shared, different_c)

enter image description here

where shared columns is a group of columns that is identical in each table, while different are columns that are disjointed in theory, as they represent 3 different product lines, but actually may share some similarly-named elements, as some variable names are the same for different product lines.

Proposed design

This is where I'm struggling. Because I don't see a good clean design that is also efficient. I wanted to get rid of the need to alter database schema every time there is a new variable added to a product line. And I believe I can do that, but I also want to make an efficient design, and I don't see one.

But this is my try:

Keep primary key, foreign key and shared column names in a single table:

data(id, item_id, shared)

Create a single table for variables only (variables are ones found in different sets):

data_variables(id, item_id, data_id, variable, value)

enter image description here

I am not sure if this design will be worth the trouble, because ... I will actually be storing more data - all the extra data_id or all the extra item_id values for each variable name. There are 15 to 30 variable names for each product line. I will be storing 15 to 30 item_id (or data_id) fields in the new design data_variables table, where in the old design there was only one item_id value per table row.

Question:

Is there a more efficient design that also does not require changes in schema design for every addition/deletion/modification of variable name in a product line? Might it be best to stick with existing design despite the trouble of altering schema when needing to add new variables?

Using JSON for variable "different" fields

one_data_table(id, item_id, product_line, shared, json_encoded_value_pairs);

Decision to not use EAV (Entity–attribute–value) Model

In my case Entities change very rarely if at all (on the order of years), and attributes change rarely as well, on the order of months or more. As such, reworking the database design to use EAV is probably not a good fit for my case.

That aside, I am still debating on my JSON Design.

Question 2

I hesitate to suggest it, but if you have many similar tables, and those tables have some attributes in common, but others that are changing, you might want to look at EAV -- Entity-Attribute-Value. It's one of those things that can either be a perfect fit for your problem, or it can become a perfect nightmare for you and those who come after you. WordPress uses it heavily. It makes it easy to add attributes to Posts, but with large tables it can also make things ssssslllooowwww.

Question 3

So far I just have 3 (or 6 actually). Maybe in a few years there will be 8 or 10. I hesitate to uproot everything, that is change all the existing tables to end up with a mess of things... I am thinking maybe having 3 entities as in my case is not so bad right now. I am pondering however if I should do something like one_data_table(id, item_id, product_line, shared, json_value_pairs); That might actually be a good idea

Question 4

Why not just move shared up into item? This might be easiest, unless you foresee the number of tables (table_a, table_b, table_c) being more variable than presented here.

Question 5

well it is possible... I am not sure if it will make sense conceptually on high level. item stores data relating to items on an invoice, things like price, description, etc., while data is to store values for an engineering drawing, things like length, weight, etc. But even let's suppose I do that... what about the different elements? Will those have to use the EAV model?

Question 6

@Dennis: Will they need to use EAV? Maybe, maybe not. EAV it something I'd consider if the rate of change and churn is so high that modifying the schema (and all supporting code) just takes too long to keep things in sync, or the system needs to be extremely flexible - enough so that users can add new attributes to things on-the-fly. But EAV can add many other complications so I wouldn't go down that road unless it was necessary (...and I admit that it is sometimes necessary).

Question 7

So I understand you don't want to have fields from data_* in item because they're not really the same thing. So how about something like this schema below? It's similar to your original design, but it adds a new Common_data table between the item table and the data_* tables.

Item
----
 - item_id
 - (other item-focused fields)
Common_Data
-----------
 - common_data_id
 - item_id - FK to item.item_id
 - shared_field_1
 - shared_field_2
 - (many fields that are already shared in data_a, data_b, and data_c)
 - data_type (can be "data a", "data b", "data c")
data_a
------
 - data_a_id
 - common_data_id - FK to common_data.common_data_id
 - different_a
data_b
------
 - data_b_id
 - common_data_id - FK to common_data.common_data_id
 - different_b
data_c
------
 - data_c_id
 - common_data_id - FK to common_data.common_data_id
 - different_c

Pros:

simplifies your shared data, moving it all up to a common data table.
similar to existing design - maybe some of your existing code can be salvaged
new shared data only needs to be added to one place.
simpler to implement.

Cons:

might not be flexible enough if you think you will soon have data_d, data_e, etc... and then remove older ones.
still requires schema (and possibly code) changes when new data_*-specific fields are added.

I'd avoid going the EAV route unless you really need the flexibility of it.

Question 8

thanks. I think EAV route as "fun" as it looked probably is a no go for reasons you've mentioned. There is no need to change parameters on the fly. I think breaking up or merging the tables though will end up being inconvenient or confusing in the long run - code-wise I will have to write data that's conceptually in the same group to different tables. While more efficient I am not convinced to myself that I should change my existing design.

score 3 · Answer 1 · 2017-01-03 22:48:57Z

So I understand you don't want to have fields from data_* in item because they're not really the same thing. So how about something like this schema below? It's similar to your original design, but it adds a new Common_data table between the item table and the data_* tables.

Item
----
 - item_id
 - (other item-focused fields)
Common_Data
-----------
 - common_data_id
 - item_id - FK to item.item_id
 - shared_field_1
 - shared_field_2
 - (many fields that are already shared in data_a, data_b, and data_c)
 - data_type (can be "data a", "data b", "data c")
data_a
------
 - data_a_id
 - common_data_id - FK to common_data.common_data_id
 - different_a
data_b
------
 - data_b_id
 - common_data_id - FK to common_data.common_data_id
 - different_b
data_c
------
 - data_c_id
 - common_data_id - FK to common_data.common_data_id
 - different_c

Pros:

simplifies your shared data, moving it all up to a common data table.
similar to existing design - maybe some of your existing code can be salvaged
new shared data only needs to be added to one place.
simpler to implement.

Cons:

might not be flexible enough if you think you will soon have data_d, data_e, etc... and then remove older ones.
still requires schema (and possibly code) changes when new data_*-specific fields are added.

I'd avoid going the EAV route unless you really need the flexibility of it.

thanks. I think EAV route as "fun" as it looked probably is a no go for reasons you've mentioned. There is no need to change parameters on the fly. I think breaking up or merging the tables though will end up being inconvenient or confusing in the long run - code-wise I will have to write data that's conceptually in the same group to different tables. While more efficient I am not convinced to myself that I should change my existing design.

Stack Exchange Network

What is an efficient design to store variables for different product lines in ER database?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

What is an efficient design to store variables for different product lines in ER database?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions