I am overviewing a structure where a mapping table of student enrollments is done. I am wondering whether an additional column (class_category_id) to the mapping table would become a problem or a noticeable advantage. That additional key would be used to filter very often.
Here's a simplified structure of the database:
Class categories
id name
1 Math
2 Science
Classes
id name class_category_id
1 M101 1
2 M102 1
3 B101 2
4 P101 2
Student enrollments
id student_id class_id *class_category_id*
1 1001 1 1
2 1002 1 1
3 1003 3 2
4 1004 4 2
Common queries would include filtering the enrollments by the class category, without the actual need to get the class information itself.
It is not 100% clear whether class_category_id could have any advantages and disadvantages.
An important note would be that a category for a class will never change so updating multiple tables to update that would never be needed.
EDIT: Small note, this real table structure would be equivalent to this but with many more columns (for the non-mapping tables) and not really in any way related to classes/students.
1 Answer 1
Adding that column would be a violation of second normal form. Under 2NF, all attributes (non-key columns) in the table must be attributes of the entire primary key, not just part of the key. In your case, the category of the class is an attribute of class
, not enrollment
.
That being said, it is not completely uncommon to denormalize tables for performance reasons. If you think this change will be a huge benefit then it is not necessarily "evil" to do it.
For you to think about, here are some of the problems that could come up with this sort of change.
The size of the database will be larger, since you are duplicating data in two places
The amount of I/O required to retrieve data may be longer on average, because your working set will be larger and because fewer rows will fit on a data page. This can affect performance.
If you decide to index the
class
table by the category ID, any queries that use theenrollment
table will not benefit from this index. You would need a separate index, which will consume more space and decrease the performance of any insert operation onenrollment
.Someone could make a mistake and put a different category ID in the
enrollment
vs.class
.Presumably,
class
may end up being static and getting entirely cached, so it may not be very expensive to join to it and retrieve the class category ID. On the other hand,enrollment
will be much larger and in constant flux, so it will not benefit from any caching. Again, this could affect performance.If classes are re-assigned to different categories (e.g. if the Greek Language department closes and all of its classes are moved to the Ancient Languages department) you will have data cleanup to do.
If category structure changes (e.g. some day they decide a class could belong to two or more categories) then your table structure will not be forward compatible with the 1:M relationship.
If you are simply doing all this because you don't want to type JOIN
all the time, consider creating a View instead. You can pre-join as many tables as you want in the view, then use the view in your FROM
clause instead.
Explore related questions
See similar questions with these tags.
select id, student_id from studentenrollment where class_id in (select id from class where class_category_id = 2)
as a filtering method?