Structuring database using SQLite

Question 1

I am working on an app that requires some basic data associations. I've picked SQLite as my database choice for simplicity and ability to use it in mobile version of my app in the future. It comes with certain limitations so it is possible that I am doing this completely wrong.

A simple scenario is as following:

I need to have cars assigned with categories such that a car can be part of multiple categories. Categories are pre-defined.

My approach:

Three tables: 'Cars', 'Categories', 'CategoriesAssigned'
Each 'Car' can belong to multiple 'Category'.
'CategoriesAssigned' is used for mapping 'Categories' to the 'Cars'

On SQLite end I create three tables:

CREATE TABLE Cars(
 Id INT PRIMARY KEY NOT NULL,
 Name TEXT NOT NULL,
);
CREATE TABLE Categories(
 Id INT PRIMARY KEY NOT NULL,
 Name TEXT NOT NULL,
);
CREATE TABLE CategoriesAssigned(
 Id INT PRIMARY KEY NOT NULL,
 CategoryId INTEGER NOT NULL,
 CarId INTEGER NOT NULL,
 FOREIGN CategoryId(Id) REFERENCES Categories(Id),
 FOREIGN CarId(Id) REFERENCES Cars(Id),
);

As such, I can have many 'Category' for each 'Car'. What I DO NOT like is that there will be a lot of duplicate data, like 'Category' will repeat for many cars.

I am still very new to databases and wanted to get some advice and feedback on how to properly handle scenarios like this.

Update:

There is a another way, which I personally hate:

 CREATE TABLE Cars(
 Id INT PRIMARY KEY NOT NULL,
 Categories TEXT,
 Name TEXT NOT NULL,
 );
 CREATE TABLE Categories(
 Id INT PRIMARY KEY NOT NULL,
 Name TEXT NOT NULL,
 );

And then add 'Categories' as coma separated: i.e. "Trucks,Luxury,Diesel" and lastly parse the string. But that somehow feels even more wrong.

Question 2

It looks like you have a many-to-many relationship here so the intermediary table is the proper way for clean relational database design. You say you'll have duplicate data, but I think you mean you'll have a table that looks very repetitive when in fact each row is unique.

If you have cars:

1. Buick Whatever
2. Ford BigRig
3. Chevy Sprite
4. Toyota Thimble
5. Nissan Panther

Categories:

1. Fast
2. Economical
3. Used

Then the CategoriesAssigned may be:

1. 1 (Buick), 2 (Economical)
2. 3 (Chevy), 2 (Economical)
3. 4 (Toyota), 2 (Economical)
4. 2 (Ford), 3 (Used)
5. 4 (Toyota), 3 (Used)
6. 2 (Ford), 1 (Fast)
7. 3 (Chevy), 1 (Fast)

So as this continues, there are lots of repeats in both columns but no two rows are the same.

Another way people can handle this without the intermediary table is to add a field to Cars called Categories which then is some sort of delimited list; I usually see the pipe character used for this. For example, the field for the Ford BigRig may be

Used|Fast

The upside is less SQL. The downside is that you have to filter the data manually a little more rather than straight by query.

Question 3

From maintainability and performance standpoints, which way is still more preferred?

Question 4

My preference is to store any data in the database in as granular means possible, which would mean having the join table. However, it depends how you intend to use this field. If you want to search and analyze data by one of the categories, then having it as a database field helps for running Group By queries. On the other hand, if it's a less significant field it may not warrant the hassle of the SQL joins. Think about tags on posts - there are an infinite number of tags we could associate, but I may not want a whole other database table. Maybe I just query for all tags that include SQLlite.

OptimisticToaster 195 bronze badges · Accepted Answer · 2016-03-02 16:56:48Z

It looks like you have a many-to-many relationship here so the intermediary table is the proper way for clean relational database design. You say you'll have duplicate data, but I think you mean you'll have a table that looks very repetitive when in fact each row is unique.

If you have cars:

1. Buick Whatever
2. Ford BigRig
3. Chevy Sprite
4. Toyota Thimble
5. Nissan Panther

Categories:

1. Fast
2. Economical
3. Used

Then the CategoriesAssigned may be:

1. 1 (Buick), 2 (Economical)
2. 3 (Chevy), 2 (Economical)
3. 4 (Toyota), 2 (Economical)
4. 2 (Ford), 3 (Used)
5. 4 (Toyota), 3 (Used)
6. 2 (Ford), 1 (Fast)
7. 3 (Chevy), 1 (Fast)

So as this continues, there are lots of repeats in both columns but no two rows are the same.

Another way people can handle this without the intermediary table is to add a field to Cars called Categories which then is some sort of delimited list; I usually see the pipe character used for this. For example, the field for the Ford BigRig may be

Used|Fast

The upside is less SQL. The downside is that you have to filter the data manually a little more rather than straight by query.

From maintainability and performance standpoints, which way is still more preferred?
My preference is to store any data in the database in as granular means possible, which would mean having the join table. However, it depends how you intend to use this field. If you want to search and analyze data by one of the categories, then having it as a database field helps for running Group By queries. On the other hand, if it's a less significant field it may not warrant the hassle of the SQL joins. Think about tags on posts - there are an infinite number of tags we could associate, but I may not want a whole other database table. Maybe I just query for all tags that include SQLlite.

CollectivesTM on Stack Overflow

Structuring database using SQLite

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related