I have a hierarchical/tree data structure in Oracle/RDBMS where I need to traverse the tree frequently. The tree is huge and therefore parsing it every time will not be efficient. A better way could be to parse and store the data to give constant time turn out.
Consider the relationship below
Entity T1, T2, T3, T4
They have hierarchical structure i.e
T1 --one-to-many--> T2 --one-to-many--> T3 --one-to-many-->T4
All of them have many(optional)-to-many with XYZ.
I have to traverse/update this tree frequently where some nodes may even not have zero 'xyz'. This can be a traversal of 100 of thousands of nodes. How can I avoid such repeated expensive traversal? Will a materialized view help here?
We have to consider that T2, T3, T4 can be millions in number but the link T1<==>XYZ, T2<==>XYZ, T3<==>XYZ, T4<==>XYZ are Thousands in number.
Maybe the data structure below is one solution:
T1[id, name] T2[id, name, T1.id] T3[id, name, T2.id] T4[id, name, T3.id] XYZ[id, name] LINK_T1_XYZ[T1.id, XYZ.id] LINK_T1_T2_XYZ[T1.id, T2.id, XYZ.id] with a check T1 should parent of T2 LINK_T1_T3_XYZ[T1.id, T3.id, XYZ.id] " LINK_T1_T4_XYZ[T1.id, T4.id, XYZ.id] "
Consider the analogy of provider-receiver here now any of the entity in hierarchy[T1, T2, T3, T4] can provide any of it. Example:
t1_a1 --> xyz_a --> t2_a2 t2_a2 --> xyz_a --> t2_a3
A very simple design could be all permutation combination give a total of 4X4=16
Concept Link ==> Provider | Stuff | Receiver T1_XYZ_T1_LINK [T1.id, XYZ.id, T1.id] T1_XYZ_T2_LINK [T1.id, XYZ.id, T2.id] T1_XYZ_T3_LINK [T1.id, XYZ.id, T3.id] T1_XYZ_T4_LINK [T1.id, XYZ.id, T4.id] T2_XYZ_T1_LINK [T2.id, XYZ.id, T1.id] T2_XYZ_T2_LINK [T2.id, XYZ.id, T2.id] T2_XYZ_T3_LINK [T2.id, XYZ.id, T3.id] T2_XYZ_T4_LINK [T2.id, XYZ.id, T4.id] T3_XYZ_T1_LINK [T3.id, XYZ.id, T1.id] T3_XYZ_T2_LINK [T3.id, XYZ.id, T2.id] T3_XYZ_T3_LINK [T3.id, XYZ.id, T3.id] T3_XYZ_T4_LINK [T3.id, XYZ.id, T4.id] T4_XYZ_T1_LINK [T4.id, XYZ.id, T1.id] T4_XYZ_T2_LINK [T4.id, XYZ.id, T2.id] T4_XYZ_T3_LINK [T4.id, XYZ.id, T3.id] T4_XYZ_T4_LINK [T4.id, XYZ.id, T4.id]
That is so many join tables, there must be a better design to do this.
Business Context
Many of us want to know this but it is quite complex than it's abstract model, I would simplify and try to give an analogy.
Consider content distribution network like Netflix. Now Netflix (T1 entity e.g T1[N001, 'Netflix']) is the Organization with head office at New York City, USA. Under this, it has again regional offices (T2 entity, say east, west, north, south) and then under that sub-regional offices (T3 entity, state level, say New York, Virginia etc) and then branch offices - (T4 entity, say city level).
Consider the analogy at international level so that T2 can go up 100, 000.
XYZ - consider it content like TV series e.g: House of cards.
Every organization has contents which are either produced in-house or purchased from outside. Example: Netflix produces 'House of cards' so it can broadcast it in its own network and may sell it to some other network too.
Now the organization O1 can make a deal to sell it's stuff/show/movies (XYZ, say HOC) to another organization O2.
You may say a tuple T1|XYZ|T2 is enough to represent that i.e O1|HOC|O2 but we also need to store how we network/stream or distribute 'HOC' from O1 to O2 and there can be 16 ways (permutation of T1, T2, T3, T4) to do it,
Let's take an example:
o1 |---r1.1 | |---c1.1.1 | | |---h1.1.1.1 | | |---h1.1.1.2 | | |---h1.1.1.3 | |---c2 | | |---h1.1.2.1 | | |---h1.1.2.2 | |---C3 | |---h1.1.3.1 | |---h1.1.3.2 | |---r1.2 |---C1.2.3 |---h1.2.3.1 |---h1.2.3.2 o2 |---r2.1 | |---c2.1.1 | | |---h2.1.1.1 | | |---h2.1.1.2 | | |---h2.1.1.3 xyz = hoc
Now some example ways in which o1 can provide hoc o2:
o1 --> hoc --> o2 o1 --> hoc --> r2.1 o1 --> hoc --> c2.1.1.1 o1 --> hoc --> h2.1.1.1 r1.1 --> hoc --> o2 r1.1 --> hoc --> r2.1 r1.1 --> hoc --> c2.1.1.1 r1.1 --> hoc --> h2.1.1.1
and so on.
So actually we need one structure to entertain two models:
- Organization model
- Distribution network
To be noted, Content is very limited here i.e XYZ may have records less than 50 mostly.
3 Answers 3
So initially it's
T2[
id PK,
name,
T1_id NOT NULL, -- because it is hierarchy, T2 can not exist outside the parent (T1).
FK1 = T1_id ref T1(id)
]
...
and you also say
LINK_T1_T2_XYZ[T1.id, T2.id, XYZ.id] with a check T1 should parent of T2
You can migrate keys down the entities hierarchy
T2[
id,
name,
T1_id NOT NULL,
PK2 = (id, T1_id)
FK2 = T1_id ref T1(id)
]
T3[
id,
name,
T1_id NOT NULL,
T2_id NOT NULL,
PK3 = (id, T1_id, T2_id)
FK3 = (T2_id, T1_id) ref T2(id, T1_id)
]
...
The code for supporting T1, ..T4 entities and the entities which refer them will be a bit complicated but you'll get instant access to any of T1,..T4 from any referencing entity without trading off RI.
-
Thanks do I get this right?==> I need to see what T1 has I will have to join all 4 tables T1, T2, T3 and T4 and 4 link tables and find all X belonging to T1.id. I know about trigger but can we have check constraint to validate such complex validation? that will be quite interesting, I would like to know the SQL.old-monk– old-monk2016年05月25日 21:14:22 +00:00Commented May 25, 2016 at 21:14
-
At T4 you'll have keys for all upper tables T3, T2, T1. So if you need T1 info you do join T1 only. No need to join intermediate T2 and T3.Serg– Serg2016年05月26日 06:52:02 +00:00Commented May 26, 2016 at 6:52
First off, if your example is ACTUALLY a good representation of what you are trying to accomplish, then you are going about it all wrong.. You have three domains here - one is businesses, one is business offices, and one is products.
Table BUSINESS should only contain information about the actual business (Netflix, for example) ... no office information at all .. just very very high level "this is information about "NETFLIX").
Table BUSINESS_OFFICE should contain information about ALL offices .. the tier structure of the offices is completely irrelevant to this table. They are all offices of some sort of another, so they all belong in this table. Now, since there IS a hierarchy of offices, you might consider adding a "parent_id" to this table (which would point to an office's higher-up) .. obviously, if an office didn't have a parent_id, then it would be the top level office. IF the structure, however, is NOT hierarchical in nature (one office could potentially have two parents), then you would want to split that information out into a separate join table.
Next, you have the products (Movies, tv shows, etc, that each office might be able to offer).
Now things are much more simple when you need to get the information you want.
You can recursively query on id/parent_id in the offices table to build a list of ids that are all part of the tree you want.. Then you join to the products table with a where clause where the office id is in that list.
Will add example DDL statements shortly.
CREATE TABLE BUSINESS (
ID INTEGER PRIMARY KEY,
NAME VARCHAR2(100),
-- OTHER COLS THAT MIGHT APPLY
)
CREATE TABLE BUSINESS_OFFICE (
ID INTEGER PRIMARY KEY,
BUSINESS_ID INTEGER REFERENCES BUSINESS (ID),
NAME VARCHAR2(100),
-- OTHER COLS, LIKE ADDRESS, ETC
)
CREATE TABLE PRODUCT (
ID INTEGER PRIMARY KEY,
NAME VARCHAR2(100),
-- OTHER COLS THAT MIGHT APPLY
)
If the structure isn't quite containing the information you need, additional tables can be added quite painlessly.. For examples
CREATE TABLE BUSINESS_PRODUCT (
BUSINESS_ID INTEGER REFERENCES BUSINESS (ID),
PRODUCT_ID INTEGER REFERENCES PRODUCT (ID),
IS_OWNER NUMBER(1) NOT NULL DEFAULT 0
)
etc etc
Determining what a business CAN offer as a product should be a separate issue than determining WHERE it can distributed from.
-
After giving it a lot of thought, going this way is a trade off, each of level of T have a small set of different attributes than each other but a lot of relational difference like billing system tables talk to T3, hardware tables related to T4 and so on. I will strengthening network model but weaken the organization model in the end.old-monk– old-monk2016年05月27日 15:59:17 +00:00Commented May 27, 2016 at 15:59
-
@SaurabhTripathi Each type of office can have it's specific metadata stored in a different table .. you can query based off of those tables if you need a specific subset. Or have an "office_type" column.Joishi Bodio– Joishi Bodio2016年05月27日 16:12:49 +00:00Commented May 27, 2016 at 16:12
You can simply use IN for XYZ, so that you don't need to have one for each relation):
SELECT *
FROM T1
LEFT JOIN T2 ON (T2.T1_ID = T1.ID)
LEFT JOIN T3 ON (T3.T2_ID = T2.ID)
LEFT JOIN T4 ON (T4.T3_ID = T3.ID)
LEFT JOIN XYZ ON (XYZ.ID IN (T1.XYZ_ID, T2.XYZ_ID, T3.XYZ_ID, T4.XYZ_ID))
Explore related questions
See similar questions with these tags.