This is my incoming data:
id text parent
-- ---- ------
0 0 NULL
1 1 0
11 11 1
2 2 0
21 21 2
211 211 21
22 22 2
3 3 0
Which I receive in a List<TreeNode>
, where TreeNode
is:
public class TreeNode
{
public int id { get; set; }
public string text { get; set; }
public int? parent { get; set; }
}
To represent my tree,
0
|--1
| |--11
|
|--2
| |--21
| | |--211
| |
| |--22
|
|--3
I have this class:
public class Tree
{
public int? id { get; set; }
public string text { get; set; }
public List<Tree> children { get; set; }
}
I build the tree from the flat list of nodes with this:
Tree tree = treeNodes.BuildTree();
This is my implementation of BuildTree
:
public static Tree BuildTree(this List<TreeNode> nodes)
{
// Create a NULL-root tree
Tree root = new Tree();
// Add nodes (sub-trees) to the tree
foreach (TreeNode node in nodes)
{
// traverse tree, find the parent, add child
root.TraverseAndAddNode(node);
}
// Return the NULL-root
// If the tree has an actual root, it will be the first child of this root
return root;
}
public static bool TraverseAndAddNode(this Tree root, TreeNode node)
{
bool nodeAdded = false;
// Check if the current root is the parent of the node to be added
if (root.id == node.parent)
{
if (root.children == null)
root.children = new List<Tree>();
root.children.Add(new Tree()
{
id = node.id,
text = node.text,
children = null
});
nodeAdded = true;
}
// Check if the current root has children and recurse the search
else if (root.children != null)
{
foreach (Tree tree in root.children)
{
nodeAdded = tree.TraverseAndAddNode(node);
if (nodeAdded)
break;
}
}
// Node's parent is not in the current tree
else
{
nodeAdded = false;
}
return nodeAdded;
}
I'm able to see a few problems here, such as the traversal code here is not efficient as it uses linear search, and the implementation would be better off done with a Dictionary, and also that I'm unnecessarily creating a NULL root for the lack of being able to dynamically identify the true root from the flat data.
How can I do better? Is there a way I can dynamically construct partial sub-trees as I parse the flat-data, and keep linking these partially constructed trees to the main tree and keep building on?
EDIT:
* Removed nullable id
from TreeNode
-- changed int? id
to int id
* Updated incorrect original example showing 1.1
instead of 11
Added the following examples:
Case 1: Multiple NULL parents (NULL root is acceptable):
id text parent NULL
-- ---- ------ |
1 C: NULL |--C:
2 drivers 1 | |--drivers
3 oem 2 ---> | |--oem
4 D: NULL |
5 data 2 |--D:
6 E: NULL | |--data
|
|--E:
Case 2: Single NULL parent (NULL root is avoidable):
id text parent NULL
-- ---- ------ |
0 My PC NULL |--My PC
1 C: 0 |
2 drivers 1 |--C:
3 oem 2 ---> | |--drivers
4 D: 0 | |--oem
5 data 2 |
6 E: 0 |--D:
| |--data
|
|--E:
4 Answers 4
Well your provided example data can't be real, because I never ever saw an int?
which could take 1.1
as value.
I will apply to the posted code the standard C# naming convention, meaning that properties are named using PascalCase
casing.
That being said, lets take a look at the Tree
class
public class Tree { public int? id { get; set; } public string text { get; set; } public List<Tree> children { get; set; } }
which seems to be just a TreeNode
with children
and without a parent
. IMO this is too much. Just add your TreeNode
to a List<TreeNode> Children
property and you won't need the Tree
class anymore.
By initializing the
Children
property at creation of the object, you will waste some space (memory) but you won't need to check ifChildren == null
anymore.If the root node (parent == null) won't be the first node in the collection then your method will fail.
Omitting braces although they might be optional is dangerous because this can lead to hidden and therefore hard to find bugs.
I would like to encourage you to always use them.
Using recursion to tackle this problem is a good idea.
Another approach
which assumes the TreeNode
looks like so
public class TreeNode
{
public int? Id { get; set; }
public string Text { get; set; }
public int? Parent { get; set; }
public List<TreeNode> Children { get; set; }
public TreeNode()
{
Children = new List<TreeNode>();
}
}
where I would usually make the setters private
and fill the properties at constructor level, but I will leave this for you to do.
(削除)
By using FirstOrDefault()
we can get the root
node very easily and should remove it from the nodes collection. Let us introduce a method to do this
private static TreeNode RemoveRootNode(this List<TreeNode> nodes)
{
if (nodes == null)
{
throw new NullReferenceException("nodes");
}
var root = nodes.FirstOrDefault(n => !n.Parent.HasValue);
if (root != null)
{
nodes.Remove(root);
}
return root;
}
Now we don't need the root node to be the first item in the collection anymore.
Next we need the "main" method which takes a List<TreeNode>
as a parameter and returns a TreeNode
which is the root node like so
public static TreeNode BuildTree(this List<TreeNode> nodes)
{
var root = nodes.RemoveRootNode();
if (root == null) { throw new ArgumentOutOfRangeException("nodes"); }
return root.BuildTree(nodes);
}
(削除ここまで)
Edit
Based on the changed example data, we now can have multiple TreeNode
with Parent == null
which makes the RemoveRootNode()
method superfluous and will result in TreeNode buildTree(this List<TreeNode>)
like so
public static TreeNode BuildTree(this List<TreeNode> nodes)
{
if (nodes == null)
{
throw new ArgumentNullException("nodes");
}
return new TreeNode().BuildTree(nodes);
}
As we see there should be a BuildTree(this TreeNode, List<TreeNode>)
method which will look like:
private static TreeNode BuildTree(this TreeNode root, List<TreeNode> nodes)
{
if (nodes.Count == 0) { return root; }
var children = root.FetchChildren(nodes).ToList();
root.Children.AddRange(children);
root.RemoveChildren(nodes);
for (int i = 0; i < children.Count; i++)
{
children[i] = children[i].BuildTree(nodes);
if (nodes.Count == 0) { break; }
}
return root;
}
which is self-explanatory. It fetches the children by using
public static IEnumerable<TreeNode> FetchChildren(this TreeNode root, List<TreeNode> nodes)
{
return nodes.Where(n => n.Parent == root.Id);
}
and add them to the Children
property of the root node and removes them from the nodes
by
public static void RemoveChildren(this TreeNode root, List<TreeNode> nodes)
{
foreach (var node in root.Children)
{
nodes.Remove(node);
}
}
Then it iterates over the the children and calls itself recursively.
(Bug)
Edit: It seems that your incoming data are sorted. If so, this 'bug' is not valid of course
Your traversing algorithm works fine as long as the TreeNode
s have the correct order - parents must be positioned before children. If you add a node that's parent is not already part of the tree, it will never be added.
To fix that bug, you need at least another reprocessing step that sorts your items...
Naming
- Properties should start with a capital letter.
TreeNode
- As mentioned in a comment, the ID property should not be nullable.
- If possible, I would also make the Id property read-only and pass them in the constructor because it should not change after initialization.
Tree
- Same as in TreeNode - Id-property should be read-only.
- The children property should also be read-only and initialized in constructor. That avoids the error-prone initialization logic inside your
TraverseAndAddNode
method.
TraverseAndAddNode
- The nodeAdded variable is actually not required:
Simplified version of the TraverseAndAddNode
method containing the suggestions above:
public static bool TraverseAndAddNode(this Tree root, TreeNode node)
{
// Check if the current root is the parent of the node to be added
if (root.id == node.parent)
{
root.children.Add(new Tree()
{
id = node.id,
text = node.text,
children = null
});
return true;
}
// Note: You could use a one-liner here
//return root.Children.Any(child => child.TraverseAndAddNode(node))
foreach (Tree tree in root.children)
{
if (tree.TraverseAndAddNode(node))
{
return true;
}
}
return false;
}
Performance
The performance intensive step is the missing one that sorts the nodes. Therefore, I would think about an alternative approach instead of sorting + recursive creating the tree.
One alternative approach may be:
- create non-connected tree items of all
TreeNode
objects - Indexing them by id using a dictionary
- Iterate over the items, get the parent from the dictionary and put the current item in its child collection.
Review
Keep in mind I am using an older version of .NET so my semantics can be somehow verbose.
Navigation
You are lacking a reference back to the parent, which makes bottom-up navigation much harder. A tree that knows its parent allows for much more flexibility in navigation.
root.children.Add(new Tree() { id = node.id, text = node.text, children = null // <- no parent reference ;-( });
public class Tree
{
public int? Id { get; set; }
public string Text { get; set; }
protected List<Tree> _children;
protected Tree _parent; // <- added reference to parent :-)
}
Now, we can use TreeExtensions
to nagivate a tree. The most basic tree traversal walkers are available. More could be added if required.
public static class TreeExtensions
{
public static IEnumerable<Tree> Descendants(this Tree value) {
// a descendant is the node self and any descendant of the children
if (value == null) yield break;
yield return value;
// depth-first pre-order tree walker
foreach (var child in value.Children)
foreach (var descendantOfChild in child.Descendants()) {
yield return descendantOfChild;
}
}
public static IEnumerable<Tree> Ancestors(this Tree value) {
// an ancestor is the node self and any ancestor of the parent
var ancestor = value;
// post-order tree walker
while (ancestor != null) {
yield return ancestor;
ancestor = ancestor.Parent;
}
}
}
Lexicon
In computer science, many API's use the terminology below.
descendant
: any child, grandchild, ..descendantOrSelf
: any descendant or the specified node itselfancestor
: any parent, grandparent, ..ancestorOrSelf
: any ancestor or the specified node itself
However, I tend to use family tree terminology.
successor
: any child, grandchild, ..descendant
: any successor or the specified node itselfpredecessor
: any parent, grandparent, ..ancestor
: any predecessor or the specified node itself
Data Integrity
Encapsulating the family members _parent
, _children
provides us with the capability of preserving the tree's integrity.
public class Tree { public int? id { get; set; } public string text { get; set; } public List<Tree> children { get; set; } }
public class Tree
{
public int? Id { get; set; }
public string Text { get; set; }
protected List<Tree> _children;
protected Tree _parent;
public Tree Parent { get { return _parent; } }
public Tree Root { get { return _parent == null ? this : _parent.Root; } }
public IEnumerable<Tree> Children {
get { return _children == null
? Enumerable.Empty<Tree>() : _children.ToArray();
}
}
}
We could then have Add
and Remove
support with sufficient validation support to maintain integrity.
public void Add(Tree child) {
if (child == null)
throw new ArgumentNullException();
if (child._parent != null)
throw new InvalidOperationException(
"A tree node must be removed from its parent before adding as child.");
if (this.Ancestors().Contains(child))
throw new InvalidOperationException("A tree cannot be a cyclic graph.");
if (_children == null) {
_children = new List<Tree>();
}
Debug.Assert(!_children.Contains(child), "At this point, the node is definately not a child");
child._parent = this;
_children.Add(child);
}
public bool Remove(Tree child) {
if (child == null)
throw new ArgumentNullException();
if (child._parent != this)
return false;
Debug.Assert(_children.Contains(child), "At this point, the node is definately a child");
child._parent = null;
_children.Remove(child);
if (!_children.Any())
_children = null;
return true;
}
Builder
I am not convinced an extension method is the way to go for this method. I would rather make a Builder
class for it, but that's probably a matter of taste.
public static Tree BuildTree(this List<TreeNode> nodes) { // method impl .. }
Determining the root node is not optimized. In one of your examples, there is a root in the input nodes. In another example, there is no root. If the former case, we could use that root as tree. In the latter, we could create the null-root.
// Create a NULL-root tree Tree root = new Tree();
private static Tree FindTreeRoot(IList<TreeNode> nodes) {
var rootNodes = nodes.Where(node => node.parent == null);
if (rootNodes.Count() != 1) return new Tree();
var rootNode = rootNodes.Single();
nodes.Remove(rootNode);
return Map(rootNode);
}
private static Tree Map(TreeNode node) {
return new Tree {
Id = node.id,
Text = node.text,
};
}
Building the tree could then become a recursive action. I am using a list, in order to remove the visited nodes from the ones that still need to be visited.
public static Tree BuildTree(IEnumerable<TreeNode> nodes) {
if (nodes == null) return new Tree();
var nodeList = nodes.ToList();
var tree = FindTreeRoot(nodeList);
BuildTree(tree, nodeList);
return tree;
}
private static void BuildTree(Tree tree, IList<TreeNode> descendants) {
// Note: could be optimized further for performance
var children = descendants.Where(node => node.parent == tree.Id).ToArray();
foreach (var child in children) {
var branch = Map(child);
tree.Add(branch);
descendants.Remove(child);
}
foreach (var branch in tree.Children) {
BuildTree(branch, descendants);
}
}
Proposed Solution
This is an appendix of the classes used as an alternative/edited flow, with an example included.
public class Program
{
public static void Main()
{
//0 0 NULL
//1 1 0
//11 11 1
//2 2 0
//21 21 2
//211 211 21
//22 22 2
//3 3 0
var nodes = new List<TreeNode> {
new TreeNode { id = 0, text = "0", parent = null },
new TreeNode { id = 1, text = "1", parent = 0 },
new TreeNode { id = 11, text = "11", parent = 1 },
new TreeNode { id = 2, text = "2", parent = 0 },
new TreeNode { id = 21, text = "21", parent = 2 },
new TreeNode { id = 211, text = "211", parent = 21 },
new TreeNode { id = 22, text = "22", parent = 2 },
new TreeNode { id = 3, text = "3", parent = 0 }
};
var tree = TreeBuilder.BuildTree(nodes);
Console.ReadKey();
}
}
public class TreeNode
{
public int id { get; set; }
public string text { get; set; }
public int? parent { get; set; }
}
public static class TreeBuilder
{
public static Tree BuildTree(IEnumerable<TreeNode> nodes) {
if (nodes == null) return new Tree();
var nodeList = nodes.ToList();
var tree = FindTreeRoot(nodeList);
BuildTree(tree, nodeList);
return tree;
}
private static void BuildTree(Tree tree, IList<TreeNode> descendants) {
var children = descendants.Where(node => node.parent == tree.Id).ToArray();
foreach (var child in children) {
var branch = Map(child);
tree.Add(branch);
descendants.Remove(child);
}
foreach (var branch in tree.Children) {
BuildTree(branch, descendants);
}
}
private static Tree FindTreeRoot(IList<TreeNode> nodes) {
var rootNodes = nodes.Where(node => node.parent == null);
if (rootNodes.Count() != 1) return new Tree();
var rootNode = rootNodes.Single();
nodes.Remove(rootNode);
return Map(rootNode);
}
private static Tree Map(TreeNode node) {
return new Tree {
Id = node.id,
Text = node.text,
};
}
}
public static class TreeExtensions
{
public static IEnumerable<Tree> Descendants(this Tree value) {
// a descendant is the node self and any descendant of the children
if (value == null) yield break;
yield return value;
// depth-first pre-order tree walker
foreach (var child in value.Children)
foreach (var descendantOfChild in child.Descendants()) {
yield return descendantOfChild;
}
}
public static IEnumerable<Tree> Ancestors(this Tree value) {
// an ancestor is the node self and any ancestor of the parent
var ancestor = value;
// post-order tree walker
while (ancestor != null) {
yield return ancestor;
ancestor = ancestor.Parent;
}
}
}
public class Tree
{
public int? Id { get; set; }
public string Text { get; set; }
protected List<Tree> _children;
protected Tree _parent;
public Tree() {
Text = string.Empty;
}
public Tree Parent { get { return _parent; } }
public Tree Root { get { return _parent == null ? this : _parent.Root; } }
public int Depth { get { return this.Ancestors().Count() - 1; } }
public IEnumerable<Tree> Children {
get { return _children == null ? Enumerable.Empty<Tree>() : _children.ToArray(); }
}
public override string ToString() {
return Text;
}
public void Add(Tree child) {
if (child == null)
throw new ArgumentNullException();
if (child._parent != null)
throw new InvalidOperationException("A tree node must be removed from its parent before adding as child.");
if (this.Ancestors().Contains(child))
throw new InvalidOperationException("A tree cannot be a cyclic graph.");
if (_children == null) {
_children = new List<Tree>();
}
Debug.Assert(!_children.Contains(child), "At this point, the node is definately not a child");
child._parent = this;
_children.Add(child);
}
public bool Remove(Tree child) {
if (child == null)
throw new ArgumentNullException();
if (child._parent != this)
return false;
Debug.Assert(_children.Contains(child), "At this point, the node is definately a child");
child._parent = null;
_children.Remove(child);
if (!_children.Any())
_children = null;
return true;
}
-
\$\begingroup\$ I was build something very similar some time ago where I used attributes and a dictionary as a lookup for creating the tree. I'm addicted to reusability ;-] \$\endgroup\$t3chb0t– t3chb0t2019年05月25日 15:46:15 +00:00Commented May 25, 2019 at 15:46
-
1\$\begingroup\$ @t3chb0t I will review it. Don't you worry :-p \$\endgroup\$dfhwze– dfhwze2019年05月25日 16:04:33 +00:00Commented May 25, 2019 at 16:04
-
\$\begingroup\$ haha, awsome! ;-) \$\endgroup\$t3chb0t– t3chb0t2019年05月25日 16:09:49 +00:00Commented May 25, 2019 at 16:09
-
1\$\begingroup\$ Thank you so much for this, especially the point about the parent reference. It helps to see things from a new perspective, from another developer's point of view. \$\endgroup\$SNag– SNag2019年05月27日 05:41:55 +00:00Commented May 27, 2019 at 5:41
This method is okay, other than the fact you're null checking an int
and it should just be 0 for a root node. Not all versions of C# support nullable types and int
nullable types are usually frowned upon.
id = null
? ID's are used for identification between objects, but you can have multiple null ID's how are you going to differentiate those? \$\endgroup\$id = null
, but may have multipleparent = null
indicating multiple root-level elements. I've updatedTreeNode
and changed the type ofid
fromint?
toint
, and added examples. \$\endgroup\$id
here is just a unique identifier for a node in the forest for id-based lookup in the tree when required; it can be any unique series really.. \$\endgroup\$