I'm migrating a relational DB over to Cosmos DB and am unsure about a tree structure. There are several thousand rows of data ("Offerings") where each offering can have child items arranged in a tree hierarchy.
In the source DB it's using the traditional ParentID
tactic:
| ID | Content | ParentID
|0 | "Root" | -1
|1 | "Child 1" | 0
|2 | "Child 2" | 0
|3 | "Child's child" | 2
In my actual DB, "Content" is not just one field but multiple columns, some of them being JSON blobs already.
When moving over to Cosmos DB, I see three options:
Option 1: Should I store the actual hierarchy?
{
id : 0,
content : "Root",
children : [
{
id : 1,
content : "Child 1",
children : []
},
{
id : 2,
content : "Child 2",
children : [
{
id : 3,
content : "Child's child",
children : []
}
]
}
]
}
This represents the actual structure and allows easy traversal of the hierarchy when reading it. However: what if I need (for example) "all children aged 10"? Is there a way to query for the property age
, no matter how deep down in the hierachy and have it return a list of child object?
Option 2: Should I store a list and an extra property representing the hierarchy?
{
id : 0,
content : "Root",
descendants : [
{
id : 1,
content : "Child 1",
},
{
id : 2,
content : "Child 2"
},
{
id : 3,
content : "Child's child",
}
],
hierarchy : {
id : 0,
children : [
{
id : 1,
children : []
},
{
id : 2,
children : [
{
id : ,
children : []
}
]
}
]
}
}
This allows me to easily get all descendants without tree traversals. Is there a major pro/con for either one? Is either one bad design? Or does it entirely depend on what queries I mostly expect?
1 Answer 1
I tried to implement a hierarchyId approach in Sqlite and really like it, it could be used in CosmosDB as well. Basically, a HierarchyId is a string field which stores the whole hierarchy in a simple format:
"/<root>/<parent>/.../<child>"
for example the child node 6 is child of 5 and the root is 1:
"/1/5/6"
Search is simple, for example all child nodes from 5:
STARTSWITH(c.hierarchyId, "/1/5/", false)
Or all the child nodes under the root 1:
STARTSWITH(c.hierarchyId, "/1/", false)
The only problem is like hierarchyId in SQL, you need to maintain the tree integrity, probably in the app side. So anyway, it is not the solution, but it is a solution.