1

I have a situation that after an initial thought ended in the following tables:

section: id, name
section_tag: id, name 
section_tag_map: section_id, tag_id
item: id, name
item_tag: id, name 
item_tag_map: item_id, tag_id
section_item_map: section_id, item_id

This describes well the data: items belong to sections. Sections and items have multiple tags for each of their "predicates", but they are different kinds of tags (section tags are specific to sections, item tags to items).

I will need to do this kind of query:

  • Select sections with section_tag's (3, 4). Easy.
  • Select items with item_tag's (1, 2) that belong to a section with section_tag's (3, 4). It didn't take long to realize what a mess of queries this requires.

I'm planning to simplify the initial idea, setting a single tag table and through app logic make items "inherit" tags that are meant to be set for sections. So it will be more like this:

section: id, name
item: id, name
tag: id, type, name
section_item: section_id, item_id
section_tag_map: item_id, tag_id
item_tag_map: item_id, tag_id

The upside is that queries will be much simpler. The downside is that I'll duplicate data and have to take care of syncing item tags when the tags from a section change.

What I'm weighting is if I should worry about having a monster query like this (first design):

SELECT
 i.*
FROM
 item AS i
INNER JOIN
 section_item_map AS sim
ON
 sim.item_id = i.id
INNER JOIN
 (SELECT
 s.*
 FROM
 section AS s
 INNER JOIN
 section_tag_map AS stm
 ON
 stm.section_id = s.id
 WHERE
 stm.tag_id IN (3, 4)
 GROUP BY
 s.id
 HAVING
 COUNT(s.id) = 2
 ) AS s
ON
 sim.section_id = s.id
INNER JOIN
 item_tag_map AS itm
ON
 itm.item_id = i.id
WHERE
 itm.tag_id IN (1, 2)
GROUP BY
 i.id
HAVING
 COUNT(i.id) = 2

...or a much simpler one like this (second design):

SELECT
 i.*
FROM
 item AS i
INNER JOIN
 item_tag_map AS itm
ON
 itm.item_id = i.id
WHERE
 itm.tag_id IN (1, 2, 3, 4)
GROUP BY
 i.id
HAVING
 COUNT(i.id) = 4

Am I on the right path or maybe I am missing something crucial here? The system is a single-user Sqlite app with about 20k sections and 100k items.

asked Apr 20, 2013 at 20:42

2 Answers 2

1

I don't see much difference in the two designs. Whether you have 2 tables for the tags or 1 table with a type attribute (that can take 2 values), you basically going to need the same (in matters or complexity and structure) queries.

What is different is your decision to materialize the relationship between items and the section_tags, the tags that are "inherited" to them through the relationship with the sections. But that can be done with either the first or the second design. And yes, it would simplify querying (on the expense of making harder to have the materialized tables/views in sync.)

So, about the choice of design, if as you say the item and the section tags are two different types of tags, either design would be ok and I don't think it would make much difference, except on the details in the interaction between applications and the database.

For the materialization part of question, I think the best would be to test both options. If you are to make a decision fast, you could take the first path. Keep the design simple and only if you have efficiency issues that can't be overcome, examine the option of materialization as it would be quite complicated (as you have already thought) to keep it synced.

answered Apr 20, 2013 at 21:49
2
  • Thank you. I'm pondering what to do. Tag changes should be rare after a section is set. That may be a good reason to avoid the monster query, but in any case I'm testing both. I posted an untested query above, will check if it works now. Commented Apr 21, 2013 at 1:06
  • I tested both solutions and decided for a mix: I will have a single tag table but will keep section_tag_map and item_tag_map separated. A third map table will mix item tags and "inherited" item tags, just for the purpose of queries. It will be sync'ed periodically, as needed. Commented Apr 21, 2013 at 14:29
0

In your scenario - is a tag literally just a tag with no other associated attributes specific to each type?

Unless there is something fundamentally different about a "section tag" and an "item tag" (e.g. if, as an object, a section tag has different characteristics than an item tag), then they shouldn't be split out (neither in separate tag tables, nor in separate materialized views).

If they are essentially the same e.g. a tag is a tag is a tag, then it makes more sense to unify - much like your 2nd option, but don't split out the tag maps into section_tag_map and item_tag_map - after all, these are just "[object]-tag maps"

You can just do:

tag_map (tag_id, obj_type, obj_id)

Where obj_type would be either 'section' or 'item' (or a representative id), and obj_id is the id from the appropriate entity.

Typically/generally I would then partition/cluster the index on obj_type to get performance.

answered Jun 5, 2018 at 14:52

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.