I have a situation that after an initial thought ended in the following tables:
section: id, name
section_tag: id, name
section_tag_map: section_id, tag_id
item: id, name
item_tag: id, name
item_tag_map: item_id, tag_id
section_item_map: section_id, item_id
This describes well the data: items belong to sections. Sections and items have multiple tags for each of their "predicates", but they are different kinds of tags (section tags are specific to sections, item tags to items).
I will need to do this kind of query:
- Select sections with section_tag's (3, 4). Easy.
- Select items with item_tag's (1, 2) that belong to a section with section_tag's (3, 4). It didn't take long to realize what a mess of queries this requires.
I'm planning to simplify the initial idea, setting a single tag table and through app logic make items "inherit" tags that are meant to be set for sections. So it will be more like this:
section: id, name
item: id, name
tag: id, type, name
section_item: section_id, item_id
section_tag_map: item_id, tag_id
item_tag_map: item_id, tag_id
The upside is that queries will be much simpler. The downside is that I'll duplicate data and have to take care of syncing item tags when the tags from a section change.
What I'm weighting is if I should worry about having a monster query like this (first design):
SELECT
i.*
FROM
item AS i
INNER JOIN
section_item_map AS sim
ON
sim.item_id = i.id
INNER JOIN
(SELECT
s.*
FROM
section AS s
INNER JOIN
section_tag_map AS stm
ON
stm.section_id = s.id
WHERE
stm.tag_id IN (3, 4)
GROUP BY
s.id
HAVING
COUNT(s.id) = 2
) AS s
ON
sim.section_id = s.id
INNER JOIN
item_tag_map AS itm
ON
itm.item_id = i.id
WHERE
itm.tag_id IN (1, 2)
GROUP BY
i.id
HAVING
COUNT(i.id) = 2
...or a much simpler one like this (second design):
SELECT
i.*
FROM
item AS i
INNER JOIN
item_tag_map AS itm
ON
itm.item_id = i.id
WHERE
itm.tag_id IN (1, 2, 3, 4)
GROUP BY
i.id
HAVING
COUNT(i.id) = 4
Am I on the right path or maybe I am missing something crucial here? The system is a single-user Sqlite app with about 20k sections and 100k items.
2 Answers 2
I don't see much difference in the two designs. Whether you have 2 tables for the tags or 1 table with a type
attribute (that can take 2 values), you basically going to need the same (in matters or complexity and structure) queries.
What is different is your decision to materialize the relationship between items
and the section_tags
, the tags that are "inherited" to them through the relationship with the sections. But that can be done with either the first or the second design. And yes, it would simplify querying (on the expense of making harder to have the materialized tables/views in sync.)
So, about the choice of design, if as you say the item and the section tags are two different types of tags, either design would be ok and I don't think it would make much difference, except on the details in the interaction between applications and the database.
For the materialization part of question, I think the best would be to test both options. If you are to make a decision fast, you could take the first path. Keep the design simple and only if you have efficiency issues that can't be overcome, examine the option of materialization as it would be quite complicated (as you have already thought) to keep it synced.
-
Thank you. I'm pondering what to do. Tag changes should be rare after a section is set. That may be a good reason to avoid the monster query, but in any case I'm testing both. I posted an untested query above, will check if it works now.moraes– moraes2013年04月21日 01:06:32 +00:00Commented Apr 21, 2013 at 1:06
-
I tested both solutions and decided for a mix: I will have a single
tag
table but will keepsection_tag_map
anditem_tag_map
separated. A third map table will mix item tags and "inherited" item tags, just for the purpose of queries. It will be sync'ed periodically, as needed.moraes– moraes2013年04月21日 14:29:59 +00:00Commented Apr 21, 2013 at 14:29
In your scenario - is a tag literally just a tag with no other associated attributes specific to each type?
Unless there is something fundamentally different about a "section tag" and an "item tag" (e.g. if, as an object, a section tag has different characteristics than an item tag), then they shouldn't be split out (neither in separate tag tables, nor in separate materialized views).
If they are essentially the same e.g. a tag is a tag is a tag, then it makes more sense to unify - much like your 2nd option, but don't split out the tag maps into section_tag_map
and item_tag_map
- after all, these are just "[object]-tag maps"
You can just do:
tag_map (tag_id, obj_type, obj_id)
Where obj_type
would be either 'section' or 'item' (or a representative id), and obj_id
is the id from the appropriate entity.
Typically/generally I would then partition/cluster the index on obj_type to get performance.
Explore related questions
See similar questions with these tags.