This question is about refactoring existing database design.
My data flow is
- User generates some data for product lines A, B, C
- Data is saved into the database once
- Data is later retrieved multiple times
Current design has 3 tables: data_a
, data_b
, data_c
, where each table shares some columns that are identical (in name) and some that are unique to that product line.
For example, same-name columns in each table are weight
, unit_system
and a few others. The differently-named columns have values that represent physical quantities of the particular product line. Those are named using various alphanumeric identifiers, like a
, b5
, e2
, and there is a different set of them for different product line. Those sets can share elements, i.e. b5
can be in more than one table, but then something like t1
can be in one table but not the others.
Problem
Currently when there is a need to add some value say x9
to product line a, I would update the database schema for data_a
to have column x9
. I make the values of x9 as 0 for existing column rows, and new records will begin to populate with the actual x9
values. Then I update the code in relevant places to insert x9
into the table or retrieve it from the table.
Existing design
data_a(id, item_id, shared, different_a)
data_b(id, item_id, shared, different_b)
data_c(id, item_id, shared, different_c)
where shared
columns is a group of columns that is identical in each table, while different
are columns that are disjointed in theory, as they represent 3 different product lines, but actually may share some similarly-named elements, as some variable names are the same for different product lines.
Proposed design
This is where I'm struggling. Because I don't see a good clean design that is also efficient. I wanted to get rid of the need to alter database schema every time there is a new variable added to a product line. And I believe I can do that, but I also want to make an efficient design, and I don't see one.
But this is my try:
Keep primary key, foreign key and shared column names in a single table:
data(id, item_id, shared)
Create a single table for variables only (variables are ones found in different
sets):
data_variables(id, item_id, data_id, variable, value)
I am not sure if this design will be worth the trouble, because ... I will actually be storing more data - all the extra data_id
or all the extra item_id
values for each variable name. There are 15 to 30 variable names for each product line. I will be storing 15 to 30 item_id
(or data_id
) fields in the new design data_variables
table, where in the old design there was only one item_id
value per table row.
Question:
Is there a more efficient design that also does not require changes in schema design for every addition/deletion/modification of variable name in a product line? Might it be best to stick with existing design despite the trouble of altering schema when needing to add new variables?
Using JSON for variable "different" fields
one_data_table(id, item_id, product_line, shared, json_encoded_value_pairs);
Decision to not use EAV (Entity–attribute–value) Model
In my case Entities change very rarely if at all (on the order of years), and attributes change rarely as well, on the order of months or more. As such, reworking the database design to use EAV is probably not a good fit for my case.
That aside, I am still debating on my JSON Design.
1 Answer 1
So I understand you don't want to have fields from data_*
in item
because they're not really the same thing. So how about something like this schema below? It's similar to your original design, but it adds a new Common_data
table between the item
table and the data_*
tables.
Item ---- - item_id - (other item-focused fields) Common_Data ----------- - common_data_id - item_id - FK to item.item_id - shared_field_1 - shared_field_2 - (many fields that are already shared in data_a, data_b, and data_c) - data_type (can be "data a", "data b", "data c") data_a ------ - data_a_id - common_data_id - FK to common_data.common_data_id - different_a data_b ------ - data_b_id - common_data_id - FK to common_data.common_data_id - different_b data_c ------ - data_c_id - common_data_id - FK to common_data.common_data_id - different_c
Pros:
- simplifies your
shared
data, moving it all up to a common data table. - similar to existing design - maybe some of your existing code can be salvaged
- new
shared
data only needs to be added to one place. - simpler to implement.
Cons:
- might not be flexible enough if you think you will soon have
data_d
,data_e
, etc... and then remove older ones. - still requires schema (and possibly code) changes when new data_*-specific fields are added.
I'd avoid going the EAV route unless you really need the flexibility of it.
-
thanks. I think EAV route as "fun" as it looked probably is a no go for reasons you've mentioned. There is no need to change parameters on the fly. I think breaking up or merging the tables though will end up being inconvenient or confusing in the long run - code-wise I will have to write data that's conceptually in the same group to different tables. While more efficient I am not convinced to myself that I should change my existing design.Dennis– Dennis01/04/2017 17:00:47Commented Jan 4, 2017 at 17:00
one_data_table(id, item_id, product_line, shared, json_value_pairs);
That might actually be a good ideashared
up intoitem
? This might be easiest, unless you foresee the number of tables (table_a, table_b, table_c) being more variable than presented here.item
stores data relating to items on an invoice, things like price, description, etc., whiledata
is to store values for an engineering drawing, things like length, weight, etc. But even let's suppose I do that... what about thedifferent
elements? Will those have to use the EAV model?