I am trying to build a publications database, and i want to develop a genre relation that has various (maybe look up tables) sub-genres attached. as an example, in the genre relation there would be a row for 'POETRY' but somehow i want to divide that into sub genres like 'PROSE', 'HAIKU' or 'FREE_VERSE'.
what is the best way to design such a thing? would i have a separate look-up table for each row in the genre relation for the various sub-genres? that doesn't seem very flexible. any input would be greatly appreciated.
-
Would you be collecting different data for each of the subtypes ('PROSE', 'HAIKU' or 'FREE_VERSE')? Or would the data all be the same for all subtypes of "POETRY"?Solomon Rutzky– Solomon Rutzky2015年05月28日 00:48:13 +00:00Commented May 28, 2015 at 0:48
-
mostly, i am just trying to create a searchable database of various published works. the data rows would be much the same, but the sub types would be used as a classification system... maybe thats not the right way to go about it?StillLearningToCode– StillLearningToCode2015年05月28日 04:29:14 +00:00Commented May 28, 2015 at 4:29
-
agreeed @TemporaryName. I'm trying to strike a balance between designing a system that is both flexible and complete, but that i can still write queries against (at my skill level)StillLearningToCode– StillLearningToCode2015年05月28日 20:02:43 +00:00Commented May 28, 2015 at 20:02
-
@StillLearningToCode Just to be clear, are there different properties for each genre? I can amend my answer to deal with a simpler case of merely needing to designate subtype but where all Publication types are really the same.Solomon Rutzky– Solomon Rutzky2015年05月28日 20:04:06 +00:00Commented May 28, 2015 at 20:04
-
1And I agree with @TemporaryName in that queries should not be the main concern at this point. Queries will be the least of your problems if you get the model wrong. If a simpler structure that requires a sub-genre is more appropriate, then I will amend my answer. If a simpler structure that requires a flexible hierarchy of genre is appropriate, then the accepted answer should be moved from mine to Walter's. Regardless, take some time to think about the nature of the data so that you can provide clearer direction: I would rather give up the 15 pts than contribute to yet more over-engineering.Solomon Rutzky– Solomon Rutzky2015年05月28日 20:37:56 +00:00Commented May 28, 2015 at 20:37
3 Answers 3
This seems nearly identical to the following question, and my answer there should be detailed enough to explain the concept (and is just too much to copy and paste here).
Don't know how to transform variable entity into relational table
For your case specifically I would think it should work itself out to be something along the lines of:
Publication
table is the main entity with the common properties, a PK onPublicationID
, and aPublicationTypeID
field that FKs toPublicationType
- Lookup table for
PublicationType
that has a PK ofPublicationTypeID
. Entries would be the various genres. - Sub-class tables for the genres, such as Poetry, Fiction, etc. Each sub-class table has a PK named
PublicationID
that also FKs back to thePublication
table, and fields for properties that are specific to the particular genre. - Where applicable, sub-class tables have sub-type field, such as
PoetryTypeID
and FKs to thePoetryType
table. - Lookup tables for sub-classes, one table per each sub-class type. The PK would be
PoetryTypeID
(e.g. Prose, Haiku, Free Verse, etc).
In light of additional details being made available that give context to the scope of the request (specifically the following comment on the question:
the data files ... are all in fact printed, no audio of image/video files. ... one sub-genre is sufficient, and ... it is possible to have just a genre without any sub sections..
), it seems clear now that the flexibility allowed for by my suggestion above is not needed as sub-types aren't really being stored. Instead, the request now seems to be for simply a means of categorization. For that, if you want a rigid structure of Genre / Sub-Genre such that:
- those are the only two levels, and
- you always know which level you are dealing with (in queries), as opposed to a dynamic structure where each Publication can be associated with a Genre / Sub-Genre at any level (1 through N) and you would need to query to get the heirarchy
then you can do something along the lines of:
Publication
-------------
PublicationID INT NOT NULL PRIMARY KEY
SubGenreID TINYINT NOT NULL FOREIGN KEY REFERENCES(SubGenre.SubGenreID)
Title VARCHAR(200) NOT NULL
...
SubGenre
-------------
SubGenreID TINYINT NOT NULL PRIMARY KEY
GenreID TINYINT NOT NULL FOREIGN KEY REFERENCES(Genre.GenreID)
Name VARCHAR(50) NOT NULL
Genre
-------------
GenreID TINYINT NOT NULL PRIMARY KEY
Name VARCHAR(50) NOT NULL
In this model you know that what is in the Publication
table is always the SubGenre
, and it always has a parent Genre
. You did say that it is possible to not have a SubGenre
(though is that really the case?), you can use the same Name
for both SubGenre
and Genre
(e.g. News -> News instead of Poetry -> Bad Teenage).
Or, if you want a dynamic structure of Genre / Sub-Genre such that:
- There can be just one level or any number of levels, and
You can only easily tell if you are either at:
- a top-level Genre (ParentID IS NULL), or
- one or more levels deep in SubGenre-land (ParentID IS NOT NULL).
Else you need to query to get the full hierarchy
then you can do something along the lines of:
Publication
-------------
PublicationID INT NOT NULL PRIMARY KEY
PublicationTypeID SMALLINT NOT NULL FOREIGN KEY
REFERENCES(PublicationType.PublicationTypeID)
Title VARCHAR(200) NOT NULL
...
PublicationType
-------------
PublicationTypeID SMALLINT NOT NULL PRIMARY KEY
ParentPublicationTypeID SMALLINT NULL FOREIGN KEY
REFERENCES(PublicationType.PublicationTypeID)
Name VARCHAR(50) NOT NULL
Please note that the ParentPublicationTypeID
field is NULLable, which indicates a top-level item when it is NULL
. This is the model that Walter Mitty suggested.
Perhaps you'll find the "Nested Set Model" for suitable for genres. Some advantages to this are:
- You can sub-genre things as deeply as your heart desires.
- You only need 1 table for defining Genres (not counting the associative table between Genre and Publication).
Here's an article that explains it: http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
In your case, the tables would like something like so: enter image description here
Some sample data: enter image description here
For this sample data, "Green Eggs and Ham" are assigned the genres of "Children's", "Prose", and "Poetry". Even though the "Poetry" assignment isn't explicit in the PublicationGenre table, it can be inferred by the fact that "Prose" is a child of "Poetry". Similarly, "Joe's Haiku" is assigned the genres of "Haiku" and "Poetry".
What the "Nested Set Model" would look like for this sample data: enter image description here
-
2The value of this technique depends greatly on which RDBMS is being used. If the implementation will be on MS SQL Server then one can use a proper, self-referencing relationship and query that easily using recursive CTEs (Common Table Expressions) OR use the hierarchyid datatype. And I believe PostgreSQL and/or Oracle can also handle recursive CTEs or at least have similar features. With those options available, this Nested Set Model appears to be clunky (at best). However, it still seems that MySQL (and possibly others) does not have those features, in which case this is a good work-around.Solomon Rutzky– Solomon Rutzky2015年06月03日 05:27:40 +00:00Commented Jun 3, 2015 at 5:27
-
1@srutzky Cool information, I never heard of recursive CTEs. Definitely sounds cleaner that way if the RDBMS supports it.Joe Borysko– Joe Borysko2015年06月03日 14:37:12 +00:00Commented Jun 3, 2015 at 14:37
A solution like class-table-inheritance is probably overkill in this case. It sounds like you need the same data, in the same fields, for sub genres and genres. This can all be stored in one lookup table, the PublicationType table.
You need one more field in this table, to record the relationship between sub-genres and genres. I'll call this field ParentPublicationTypeId. The entry in the Haiku row would contain the PublicationTypeId for Poetry. This references a different row in the same table, but it's just another case of a foreign key. The buzzword is "reflexive relationships" if you want to find further reading.