I'm developing a desktop application to help the technician to easily provide updates to an embedded system.
The background is very simple the database (SQLite
) have several tables, not identical but quite similar, where each row represent an item. For this question we can say every table has this structure: id
, value
.
My problem is to design the table packages
.
A "package" may contain 0, 1 or more items from 0, 1 or more tables.
Let's see a basic example:
Table Foo
id | value |
---|---|
1 | abc |
2 | def |
Table Bar
id | value |
---|---|
1 | ghi |
2 | jkl |
Here some examples of packages:
pkg1: from Foo (1)
pkg2: from Bar (1, 2)
pkg3: from Foo (1, 2) and from Bar (2)
in the real case the item's tables are 10. My first attempt was pretty naive:
Table packages, #1
id | name | idFoo | idBar |
---|---|---|---|
1 | pkg1 | 1 | NULL |
2 | pkg2 | NULL | 1 |
2 | pkg2 | NULL | 2 |
3 | pkg3 | 1 | NULL |
3 | pkg3 | 2 | NULL |
3 | pkg3 | NULL | 2 |
The id*
fields are foreign keys to the related tables.
I don't like it since it requires as many columns as there are tables, and as many rows as there are items to be inserted. Another downside is the primary key would be composed of all the fields to ensure there are no dupes.
A second attempt was to put all the id
s inside the same field:
Table packages, #2
id | name | idsFoo | idsBar |
---|---|---|---|
1 | pkg1 | 1 | NULL |
2 | pkg2 | NULL | 1,2 |
3 | pkg3 | 1,2 | 2 |
This is a bit better since I have one row per package and the primary key can be the usual id
. But the ids*
columns are not foreign keys anymore, so I cannot guarantee the integrity of the database.
What are the downsides of the last approach?
3 Answers 3
Neither of these options are good.
Not properly normalized.
- Normalization is violated because you would need to repeat the package data once for each joining row.
- You can never be sure which row joins to which table, and you would need multiple unique constraints, one for each table.
- You would also need a
CHECK
constraint to ensure that there was always exactly one foreign key, no more, no less.
This is completely broken:
- Normalization is violated because you have multiple pieces of data in a single column, and most DBMSs do not support foreign keys on these kinds of arrays. Some do not even support arrays at all, and require strings, which you cannot guarantee even contains a list of integers.
A properly normalized design would have two separate join tables: PackageFoo
and PackageBar
. These two last tables would be standard join tables, with a composite two-column primary key.
Package
id | name |
---|---|
1 | pkg1 |
2 | pkg2 |
2 | pkg2 |
3 | pkg3 |
3 | pkg3 |
3 | pkg3 |
PackageFoo
idPackage | idFoo |
---|---|
1 | 1 |
3 | 1 |
3 | 2 |
PackageBar
idPackage | idBar |
---|---|
2 | 1 |
2 | 2 |
3 | 2 |
Now you can separately identify exactly which Package and which Foo and which Bar are associated, there are no nulls, and no repetition of data.
Having said that, given there appear to be ten tables involved, perhaps either those tables could be combined, or a Polymorphic design could be used.
For example, you can have a Package
containing Furniture
, which can be either Chair
or Table
.
Furniture (Type
is an enum column, or has a CHECK
constraining to 1,2
or A,B
etc, the PK is on id, Type
)
id | Type |
---|---|
1 | C |
2 | C |
3 | T |
4 | T |
Chair (Type
is a computed column, the PK is on idChair, Type
and both columns also a composite FK to Furniture
)
idChair | Type | Name |
---|---|---|
1 | C | Wicker |
2 | C | Plastic |
Table (Type
is a computed column, the PK is on idTable, Type
and both columns also a composite FK to Furniture
)
idTable | Type | Name |
---|---|---|
3 | T | Wood |
4 | T | Metal |
PackageFurniture (idFurniture,FurnitureType
are a composite FK on Furniture
not the two child tables)
idPackage | idFurniture | FurnitureType |
---|---|---|
1 | 1 | C |
2 | 1 | C |
2 | 3 | T |
3 | 1 | C |
3 | 4 | T |
-
Why include the
type
in the PK of thechair
andtable
columns?Bergi– Bergi2024年11月29日 20:57:36 +00:00Commented Nov 29, 2024 at 20:57 -
If you don't then there is no way to enforce that a given
idFurniture
actually represents aType=C
from the baseFurniture
table. In other words, nothing would stop you inserting the ID of aTable
intoChair
.Charlieface– Charlieface2024年11月30日 17:35:26 +00:00Commented Nov 30, 2024 at 17:35 -
No, I mean I understand why you'd include the computed column (or something similar with the constant value) in the table, and why you'd have the foreign key from
(id, type)
tofurniture
. But why include thetype
column in the primary key?Bergi– Bergi2024年12月01日 07:01:10 +00:00Commented Dec 1, 2024 at 7:01 -
You're right, it's not technically necessary for the child tables Makes no difference as it's a computed column so is always the same value. For the parent table you need the PK on both columns to be able to put a FK on both together.Charlieface– Charlieface2024年12月01日 10:56:56 +00:00Commented Dec 1, 2024 at 10:56
A more traditional approach:
select *
from packages_tables ;
+------+------+----+
| pkg | Tipe | id |
+------+------+----+
| pkg1 | Foo | 1 |
| pkg2 | Bar | 1 |
| pkg2 | Bar | 2 |
| pkg3 | Foo | 1 |
| pkg3 | Foo | 2 |
| pkg3 | Bar | 1 |
+------+------+----+
The Primary Key is all three fields and not a NULL in sight.
Then you can do something like this:
select
pt.pkg
, pt.tipe
, foo.value foo_v
, bar.value bar_v
from package_tables pt
left join foo on pt.tipe = 'Foo' and pt.id = foo.id
left join bar on pt.tipe = 'Bar' and pt.id = bar.id
;
+------+------+-------+-------+
| pkg | Tipe | foo_v | bar_v |
+------+------+-------+-------+
| pkg1 | Foo | abc | |
| pkg2 | Bar | | ghi |
| pkg2 | Bar | | jkl |
| pkg3 | Foo | abc | |
| pkg3 | Foo | def | |
| pkg2 | Bar | | ghi |
+------+------+-------+-------+
What are the downsides of the last approach?
As you say, you cannot enforce the relationships with Foreign Keys.
Also, if you ever need to find which packages contain a related table (searching within those comma-separated lists), your query will be painful to read and its performance will be terrible (Table Scanning is all but guaranteed).
A little more traditional :)
Have a list of objects combining Foo and Bar:
create table Objects (
oid primary key,
type check('Foo', 'Bar'),
);
create table Foo ( id references Objects(oid), name);
create table Bar ( id references Objects(oid), name);
-- both Foo and Bar can live without specified primary key,
-- it would be good to have indexes on some attributes
-- or even PK based on these attributes but they are not important
-- for this structure
create table Packages (pid primary key, name);
-- Just a list of packages and descriptions of these packages
create table Deliveries (
pid references Packages(pid),
oid references Objects(oid)
);
-- and now you can insert your data:
insert into Objects values (1, 'Foo'), (2, 'Foo'), (3, 'Bar'), (4, 'Bar');
insert into Foo values (1, 'abc'), (2, 'def');
insert into Bar values (3, 'ghi'), (4, 'jik');
insert into Packages values (1, 'pkg1'), (2, 'pkg2'), (3, 'pkg3');
insert into Deliveries values
(1, 1),
(2, 1), (2, 2),
(3, 1), (3, 2), (3, 4),
);
select
p.name as Package_Name,
o.type as Object_Type,
coalsce(Foo.name, Bar.Name) as Object_Name
from Deliveries d
join Packages p on d.pid=p.pid
join Objects o on d.oid=p.pid
left join Foo on o.oid=Foo.id
left join Bar on o.oid=Bar.id
This will allow you to use FKs checks while inserting data.
The only disadvantage - it is possible to have a record in Objects, but not corresponding object description in Foo
or Bar
. If that happen, the shown select will print NULL in the Object_Name
column.
On the other hand, do not forget, that you will have a client application on top of the database schema. You can do a lot of checks and joins there without relying on database. And in case of SQLite that is very often justified.
id
,value
." - this appears to be the root of your problem. What exactly do these tables represent, and how are they "similar but not identical"?