I am working on an app that requires some basic data associations. I've picked SQLite as my database choice for simplicity and ability to use it in mobile version of my app in the future. It comes with certain limitations so it is possible that I am doing this completely wrong.
A simple scenario is as following:
I need to have cars assigned with categories such that a car can be part of multiple categories. Categories are pre-defined.
My approach:
- Three tables:
'Cars','Categories','CategoriesAssigned' - Each
'Car'can belong to multiple'Category'. 'CategoriesAssigned'is used for mapping'Categories'to the'Cars'
On SQLite end I create three tables:
CREATE TABLE Cars(
Id INT PRIMARY KEY NOT NULL,
Name TEXT NOT NULL,
);
CREATE TABLE Categories(
Id INT PRIMARY KEY NOT NULL,
Name TEXT NOT NULL,
);
CREATE TABLE CategoriesAssigned(
Id INT PRIMARY KEY NOT NULL,
CategoryId INTEGER NOT NULL,
CarId INTEGER NOT NULL,
FOREIGN CategoryId(Id) REFERENCES Categories(Id),
FOREIGN CarId(Id) REFERENCES Cars(Id),
);
As such, I can have many 'Category' for each 'Car'. What I DO NOT like is that there will be a lot of duplicate data, like 'Category' will repeat for many cars.
I am still very new to databases and wanted to get some advice and feedback on how to properly handle scenarios like this.
Update:
There is a another way, which I personally hate:
CREATE TABLE Cars(
Id INT PRIMARY KEY NOT NULL,
Categories TEXT,
Name TEXT NOT NULL,
);
CREATE TABLE Categories(
Id INT PRIMARY KEY NOT NULL,
Name TEXT NOT NULL,
);
And then add 'Categories' as coma separated: i.e. "Trucks,Luxury,Diesel" and lastly parse the string. But that somehow feels even more wrong.
1 Answer 1
It looks like you have a many-to-many relationship here so the intermediary table is the proper way for clean relational database design. You say you'll have duplicate data, but I think you mean you'll have a table that looks very repetitive when in fact each row is unique.
If you have cars:
1. Buick Whatever
2. Ford BigRig
3. Chevy Sprite
4. Toyota Thimble
5. Nissan Panther
Categories:
1. Fast
2. Economical
3. Used
Then the CategoriesAssigned may be:
1. 1 (Buick), 2 (Economical)
2. 3 (Chevy), 2 (Economical)
3. 4 (Toyota), 2 (Economical)
4. 2 (Ford), 3 (Used)
5. 4 (Toyota), 3 (Used)
6. 2 (Ford), 1 (Fast)
7. 3 (Chevy), 1 (Fast)
So as this continues, there are lots of repeats in both columns but no two rows are the same.
Another way people can handle this without the intermediary table is to add a field to Cars called Categories which then is some sort of delimited list; I usually see the pipe character used for this. For example, the field for the Ford BigRig may be
Used|Fast
The upside is less SQL. The downside is that you have to filter the data manually a little more rather than straight by query.