Building a Tree from a flat List<Nodes>

Question 1

This is my incoming data:

id text parent
-- ---- ------
0 0 NULL
1 1 0
11 11 1
2 2 0
21 21 2
211 211 21
22 22 2
3 3 0

Which I receive in a List<TreeNode>, where TreeNode is:

public class TreeNode
{
 public int id { get; set; }
 public string text { get; set; }
 public int? parent { get; set; }
}

To represent my tree,

0
|--1
| |--11
|
|--2
| |--21
| | |--211
| |
| |--22
|
|--3

I have this class:

public class Tree
{
 public int? id { get; set; }
 public string text { get; set; }
 public List<Tree> children { get; set; }
}

I build the tree from the flat list of nodes with this:

Tree tree = treeNodes.BuildTree();

This is my implementation of BuildTree:

public static Tree BuildTree(this List<TreeNode> nodes)
{
 // Create a NULL-root tree
 Tree root = new Tree();
 // Add nodes (sub-trees) to the tree
 foreach (TreeNode node in nodes)
 {
 // traverse tree, find the parent, add child
 root.TraverseAndAddNode(node);
 }
 // Return the NULL-root
 // If the tree has an actual root, it will be the first child of this root
 return root;
}
public static bool TraverseAndAddNode(this Tree root, TreeNode node)
{
 bool nodeAdded = false;
 // Check if the current root is the parent of the node to be added
 if (root.id == node.parent)
 {
 if (root.children == null)
 root.children = new List<Tree>();
 root.children.Add(new Tree()
 {
 id = node.id,
 text = node.text,
 children = null
 });
 nodeAdded = true;
 }
 // Check if the current root has children and recurse the search
 else if (root.children != null)
 {
 foreach (Tree tree in root.children)
 {
 nodeAdded = tree.TraverseAndAddNode(node);
 if (nodeAdded)
 break;
 }
 }
 // Node's parent is not in the current tree
 else
 {
 nodeAdded = false;
 }
 return nodeAdded;
}

I'm able to see a few problems here, such as the traversal code here is not efficient as it uses linear search, and the implementation would be better off done with a Dictionary, and also that I'm unnecessarily creating a NULL root for the lack of being able to dynamically identify the true root from the flat data.

How can I do better? Is there a way I can dynamically construct partial sub-trees as I parse the flat-data, and keep linking these partially constructed trees to the main tree and keep building on?

EDIT:
* Removed nullable id from TreeNode -- changed int? id to int id
* Updated incorrect original example showing 1.1 instead of 11

Added the following examples:

Case 1: Multiple NULL parents (NULL root is acceptable):

id text parent NULL 
-- ---- ------ | 
1 C: NULL |--C: 
2 drivers 1 | |--drivers 
3 oem 2 ---> | |--oem 
4 D: NULL | 
5 data 2 |--D: 
6 E: NULL | |--data 
 |
 |--E:

Case 2: Single NULL parent (NULL root is avoidable):

id text parent NULL
-- ---- ------ |
0 My PC NULL |--My PC
1 C: 0 | 
2 drivers 1 |--C: 
3 oem 2 ---> | |--drivers
4 D: 0 | |--oem 
5 data 2 | 
6 E: 0 |--D: 
 | |--data 
 |
 |--E:

Question 2

When would it be appropriate to have id = null? ID's are used for identification between objects, but you can have multiple null ID's how are you going to differentiate those?

Question 3

@Denis: In all, the Tree in my example can have only one null ID, and that would be the (imaginary) root. The incoming data should never have any id = null, but may have multiple parent = null indicating multiple root-level elements. I've updated TreeNode and changed the type of id from int? to int, and added examples.

Question 4

@SNag Does the node id have to be unique in the forest? For instance, when id is 211, it can be both the 11th child of node 2 and the 1st child of node 21. In other words, does the id matter to specify the unique location of a node in the forest?

Question 5

@SNag id sometimes seems to represent the location of the node in the forest, and othertimes a unique incremental flattened sequence. What is the purpose of id in TreeNode?

Question 6

@dfhwze: You're right with what you thought; id here is just a unique identifier for a node in the forest for id-based lookup in the tree when required; it can be any unique series really..

Question 7

Well your provided example data can't be real, because I never ever saw an int? which could take 1.1 as value.

I will apply to the posted code the standard C# naming convention, meaning that properties are named using PascalCase casing.

That being said, lets take a look at the Tree class

public class Tree
{
 public int? id { get; set; }
 public string text { get; set; }
 public List<Tree> children { get; set; }
}

which seems to be just a TreeNode with children and without a parent. IMO this is too much. Just add your TreeNode to a List<TreeNode> Children property and you won't need the Tree class anymore.

By initializing the Children property at creation of the object, you will waste some space (memory) but you won't need to check if Children == null anymore.
If the root node (parent == null) won't be the first node in the collection then your method will fail.
Omitting braces although they might be optional is dangerous because this can lead to hidden and therefore hard to find bugs.

I would like to encourage you to always use them.
Using recursion to tackle this problem is a good idea.

Another approach

which assumes the TreeNode looks like so

public class TreeNode
{
 public int? Id { get; set; }
 public string Text { get; set; }
 public int? Parent { get; set; }
 public List<TreeNode> Children { get; set; }
 public TreeNode()
 {
 Children = new List<TreeNode>();
 }
}

where I would usually make the setters private and fill the properties at constructor level, but I will leave this for you to do.

~~(削除) By using FirstOrDefault() we can get the root node very easily and should remove it from the nodes collection. Let us introduce a method to do this~~

private static TreeNode RemoveRootNode(this List<TreeNode> nodes)
{
 if (nodes == null)
 {
 throw new NullReferenceException("nodes");
 }
 var root = nodes.FirstOrDefault(n => !n.Parent.HasValue);
 if (root != null)
 {
 nodes.Remove(root);
 }
 return root;
}

Now we don't need the root node to be the first item in the collection anymore.

Next we need the "main" method which takes a List<TreeNode> as a parameter and returns a TreeNode which is the root node like so

public static TreeNode BuildTree(this List<TreeNode> nodes)
{
 var root = nodes.RemoveRootNode();
 if (root == null) { throw new ArgumentOutOfRangeException("nodes"); }
 return root.BuildTree(nodes);
}

(削除ここまで)

Edit

Based on the changed example data, we now can have multiple TreeNode with Parent == null which makes the RemoveRootNode() method superfluous and will result in TreeNode buildTree(this List<TreeNode>) like so

public static TreeNode BuildTree(this List<TreeNode> nodes)
{
 if (nodes == null)
 {
 throw new ArgumentNullException("nodes");
 }
 return new TreeNode().BuildTree(nodes);
}

As we see there should be a BuildTree(this TreeNode, List<TreeNode>) method which will look like:

private static TreeNode BuildTree(this TreeNode root, List<TreeNode> nodes)
{
 if (nodes.Count == 0) { return root; }
 var children = root.FetchChildren(nodes).ToList();
 root.Children.AddRange(children);
 root.RemoveChildren(nodes);
 for (int i = 0; i < children.Count; i++)
 {
 children[i] = children[i].BuildTree(nodes);
 if (nodes.Count == 0) { break; }
 }
 return root;
}

which is self-explanatory. It fetches the children by using

public static IEnumerable<TreeNode> FetchChildren(this TreeNode root, List<TreeNode> nodes)
{
 return nodes.Where(n => n.Parent == root.Id);
}

and add them to the Children property of the root node and removes them from the nodes by

public static void RemoveChildren(this TreeNode root, List<TreeNode> nodes)
{
 foreach (var node in root.Children)
 {
 nodes.Remove(node);
 }
}

Then it iterates over the the children and calls itself recursively.

Question 8

(Bug)

Edit: It seems that your incoming data are sorted. If so, this 'bug' is not valid of course

Your traversing algorithm works fine as long as the TreeNodes have the correct order - parents must be positioned before children. If you add a node that's parent is not already part of the tree, it will never be added.

To fix that bug, you need at least another reprocessing step that sorts your items...

Naming

Properties should start with a capital letter.

TreeNode

As mentioned in a comment, the ID property should not be nullable.
If possible, I would also make the Id property read-only and pass them in the constructor because it should not change after initialization.

Tree

Same as in TreeNode - Id-property should be read-only.
The children property should also be read-only and initialized in constructor. That avoids the error-prone initialization logic inside your TraverseAndAddNode method.

TraverseAndAddNode

The nodeAdded variable is actually not required:

Simplified version of the TraverseAndAddNode method containing the suggestions above:

public static bool TraverseAndAddNode(this Tree root, TreeNode node)
{
 // Check if the current root is the parent of the node to be added
 if (root.id == node.parent)
 {
 root.children.Add(new Tree()
 {
 id = node.id,
 text = node.text,
 children = null
 });
 return true;
 }
 // Note: You could use a one-liner here 
 //return root.Children.Any(child => child.TraverseAndAddNode(node))
 foreach (Tree tree in root.children)
 {
 if (tree.TraverseAndAddNode(node))
 {
 return true;
 }
 }
 return false;
}

Performance

The performance intensive step is the missing one that sorts the nodes. Therefore, I would think about an alternative approach instead of sorting + recursive creating the tree.

One alternative approach may be:

create non-connected tree items of all TreeNode objects
Indexing them by id using a dictionary
Iterate over the items, get the parent from the dictionary and put the current item in its child collection.

Question 9

Review

Keep in mind I am using an older version of .NET so my semantics can be somehow verbose.

Navigation

You are lacking a reference back to the parent, which makes bottom-up navigation much harder. A tree that knows its parent allows for much more flexibility in navigation.

root.children.Add(new Tree()
{
 id = node.id,
 text = node.text,
 children = null // <- no parent reference ;-(
});

public class Tree
{
 public int? Id { get; set; }
 public string Text { get; set; }
 protected List<Tree> _children;
 protected Tree _parent; // <- added reference to parent :-)
}

Now, we can use TreeExtensions to nagivate a tree. The most basic tree traversal walkers are available. More could be added if required.

public static class TreeExtensions
{
 public static IEnumerable<Tree> Descendants(this Tree value) {
 // a descendant is the node self and any descendant of the children
 if (value == null) yield break;
 yield return value;
 // depth-first pre-order tree walker
 foreach (var child in value.Children)
 foreach (var descendantOfChild in child.Descendants()) {
 yield return descendantOfChild;
 }
 }
 public static IEnumerable<Tree> Ancestors(this Tree value) {
 // an ancestor is the node self and any ancestor of the parent
 var ancestor = value;
 // post-order tree walker
 while (ancestor != null) {
 yield return ancestor;
 ancestor = ancestor.Parent;
 }
 }
}

Lexicon

In computer science, many API's use the terminology below.

descendant: any child, grandchild, ..
descendantOrSelf: any descendant or the specified node itself
ancestor: any parent, grandparent, ..
ancestorOrSelf: any ancestor or the specified node itself

However, I tend to use family tree terminology.

successor: any child, grandchild, ..
descendant: any successor or the specified node itself
predecessor: any parent, grandparent, ..
ancestor: any predecessor or the specified node itself

Data Integrity

Encapsulating the family members _parent, _children provides us with the capability of preserving the tree's integrity.

public class Tree
{
 public int? id { get; set; }
 public string text { get; set; }
 public List<Tree> children { get; set; }
}

public class Tree
{
 public int? Id { get; set; }
 public string Text { get; set; }
 protected List<Tree> _children;
 protected Tree _parent;
 public Tree Parent { get { return _parent; } }
 public Tree Root { get { return _parent == null ? this : _parent.Root; } }
 public IEnumerable<Tree> Children {
 get { return _children == null 
 ? Enumerable.Empty<Tree>() : _children.ToArray(); 
 }
 }
}

We could then have Add and Remove support with sufficient validation support to maintain integrity.

public void Add(Tree child) {
 if (child == null)
 throw new ArgumentNullException();
 if (child._parent != null)
 throw new InvalidOperationException(
 "A tree node must be removed from its parent before adding as child.");
 if (this.Ancestors().Contains(child))
 throw new InvalidOperationException("A tree cannot be a cyclic graph.");
 if (_children == null) {
 _children = new List<Tree>();
 }
 Debug.Assert(!_children.Contains(child), "At this point, the node is definately not a child");
 child._parent = this;
 _children.Add(child);
}
public bool Remove(Tree child) {
 if (child == null)
 throw new ArgumentNullException();
 if (child._parent != this)
 return false;
 Debug.Assert(_children.Contains(child), "At this point, the node is definately a child");
 child._parent = null;
 _children.Remove(child);
 if (!_children.Any())
 _children = null;
 return true;
}

Builder

I am not convinced an extension method is the way to go for this method. I would rather make a Builder class for it, but that's probably a matter of taste.

public static Tree BuildTree(this List<TreeNode> nodes)
{
 // method impl ..
}

Determining the root node is not optimized. In one of your examples, there is a root in the input nodes. In another example, there is no root. If the former case, we could use that root as tree. In the latter, we could create the null-root.

// Create a NULL-root tree
Tree root = new Tree();

private static Tree FindTreeRoot(IList<TreeNode> nodes) {
 var rootNodes = nodes.Where(node => node.parent == null);
 if (rootNodes.Count() != 1) return new Tree();
 var rootNode = rootNodes.Single();
 nodes.Remove(rootNode);
 return Map(rootNode);
}
private static Tree Map(TreeNode node) {
 return new Tree {
 Id = node.id,
 Text = node.text,
 };
}

Building the tree could then become a recursive action. I am using a list, in order to remove the visited nodes from the ones that still need to be visited.

public static Tree BuildTree(IEnumerable<TreeNode> nodes) {
 if (nodes == null) return new Tree();
 var nodeList = nodes.ToList();
 var tree = FindTreeRoot(nodeList);
 BuildTree(tree, nodeList);
 return tree;
}
private static void BuildTree(Tree tree, IList<TreeNode> descendants) {
 // Note: could be optimized further for performance
 var children = descendants.Where(node => node.parent == tree.Id).ToArray();
 foreach (var child in children) {
 var branch = Map(child);
 tree.Add(branch);
 descendants.Remove(child);
 }
 foreach (var branch in tree.Children) {
 BuildTree(branch, descendants);
 }
}

Proposed Solution

This is an appendix of the classes used as an alternative/edited flow, with an example included.

public class Program
{
 public static void Main()
 {
 //0 0 NULL
 //1 1 0
 //11 11 1
 //2 2 0
 //21 21 2
 //211 211 21
 //22 22 2
 //3 3 0
 var nodes = new List<TreeNode> { 
 new TreeNode { id = 0, text = "0", parent = null },
 new TreeNode { id = 1, text = "1", parent = 0 },
 new TreeNode { id = 11, text = "11", parent = 1 },
 new TreeNode { id = 2, text = "2", parent = 0 },
 new TreeNode { id = 21, text = "21", parent = 2 },
 new TreeNode { id = 211, text = "211", parent = 21 },
 new TreeNode { id = 22, text = "22", parent = 2 },
 new TreeNode { id = 3, text = "3", parent = 0 }
 };
 var tree = TreeBuilder.BuildTree(nodes);
 Console.ReadKey();
 }
}
public class TreeNode
{
 public int id { get; set; }
 public string text { get; set; }
 public int? parent { get; set; }
}
public static class TreeBuilder
{
 public static Tree BuildTree(IEnumerable<TreeNode> nodes) {
 if (nodes == null) return new Tree();
 var nodeList = nodes.ToList();
 var tree = FindTreeRoot(nodeList);
 BuildTree(tree, nodeList);
 return tree;
 }
 private static void BuildTree(Tree tree, IList<TreeNode> descendants) {
 var children = descendants.Where(node => node.parent == tree.Id).ToArray();
 foreach (var child in children) {
 var branch = Map(child);
 tree.Add(branch);
 descendants.Remove(child);
 }
 foreach (var branch in tree.Children) {
 BuildTree(branch, descendants);
 }
 }
 private static Tree FindTreeRoot(IList<TreeNode> nodes) {
 var rootNodes = nodes.Where(node => node.parent == null);
 if (rootNodes.Count() != 1) return new Tree();
 var rootNode = rootNodes.Single();
 nodes.Remove(rootNode);
 return Map(rootNode);
 }
 private static Tree Map(TreeNode node) {
 return new Tree {
 Id = node.id,
 Text = node.text,
 };
 }
}
public static class TreeExtensions
{
 public static IEnumerable<Tree> Descendants(this Tree value) {
 // a descendant is the node self and any descendant of the children
 if (value == null) yield break;
 yield return value;
 // depth-first pre-order tree walker
 foreach (var child in value.Children)
 foreach (var descendantOfChild in child.Descendants()) {
 yield return descendantOfChild;
 }
 }
 public static IEnumerable<Tree> Ancestors(this Tree value) {
 // an ancestor is the node self and any ancestor of the parent
 var ancestor = value;
 // post-order tree walker
 while (ancestor != null) {
 yield return ancestor;
 ancestor = ancestor.Parent;
 }
 }
}
public class Tree
{
 public int? Id { get; set; }
 public string Text { get; set; }
 protected List<Tree> _children;
 protected Tree _parent;
 public Tree() {
 Text = string.Empty;
 }
 public Tree Parent { get { return _parent; } }
 public Tree Root { get { return _parent == null ? this : _parent.Root; } }
 public int Depth { get { return this.Ancestors().Count() - 1; } }
 public IEnumerable<Tree> Children {
 get { return _children == null ? Enumerable.Empty<Tree>() : _children.ToArray(); }
 }
 public override string ToString() {
 return Text;
 }
 public void Add(Tree child) {
 if (child == null)
 throw new ArgumentNullException();
 if (child._parent != null)
 throw new InvalidOperationException("A tree node must be removed from its parent before adding as child.");
 if (this.Ancestors().Contains(child))
 throw new InvalidOperationException("A tree cannot be a cyclic graph.");
 if (_children == null) {
 _children = new List<Tree>();
 }
 Debug.Assert(!_children.Contains(child), "At this point, the node is definately not a child");
 child._parent = this;
 _children.Add(child);
 }
 public bool Remove(Tree child) {
 if (child == null)
 throw new ArgumentNullException();
 if (child._parent != this)
 return false;
 Debug.Assert(_children.Contains(child), "At this point, the node is definately a child");
 child._parent = null;
 _children.Remove(child);
 if (!_children.Any())
 _children = null;
 return true;
 }

Question 10

I was build something very similar some time ago where I used attributes and a dictionary as a lookup for creating the tree. I'm addicted to reusability ;-]

Question 11

@t3chb0t I will review it. Don't you worry :-p

Question 12

haha, awsome! ;-)

Question 13

Thank you so much for this, especially the point about the parent reference. It helps to see things from a new perspective, from another developer's point of view.

Question 14

This method is okay, other than the fact you're null checking an int and it should just be 0 for a root node. Not all versions of C# support nullable types and int nullable types are usually frowned upon.

Heslacher Heslacher 50.9k5 gold badges83 silver badges177 bronze badges · Answer 1 · 2017-05-23 06:41:19Z

Well your provided example data can't be real, because I never ever saw an int? which could take 1.1 as value.

I will apply to the posted code the standard C# naming convention, meaning that properties are named using PascalCase casing.

That being said, lets take a look at the Tree class

public class Tree
{
 public int? id { get; set; }
 public string text { get; set; }
 public List<Tree> children { get; set; }
}

which seems to be just a TreeNode with children and without a parent. IMO this is too much. Just add your TreeNode to a List<TreeNode> Children property and you won't need the Tree class anymore.

By initializing the Children property at creation of the object, you will waste some space (memory) but you won't need to check if Children == null anymore.
If the root node (parent == null) won't be the first node in the collection then your method will fail.
Omitting braces although they might be optional is dangerous because this can lead to hidden and therefore hard to find bugs.

I would like to encourage you to always use them.
Using recursion to tackle this problem is a good idea.

Another approach

which assumes the TreeNode looks like so

public class TreeNode
{
 public int? Id { get; set; }
 public string Text { get; set; }
 public int? Parent { get; set; }
 public List<TreeNode> Children { get; set; }
 public TreeNode()
 {
 Children = new List<TreeNode>();
 }
}

where I would usually make the setters private and fill the properties at constructor level, but I will leave this for you to do.

~~(削除) By using FirstOrDefault() we can get the root node very easily and should remove it from the nodes collection. Let us introduce a method to do this~~

private static TreeNode RemoveRootNode(this List<TreeNode> nodes)
{
 if (nodes == null)
 {
 throw new NullReferenceException("nodes");
 }
 var root = nodes.FirstOrDefault(n => !n.Parent.HasValue);
 if (root != null)
 {
 nodes.Remove(root);
 }
 return root;
}

Now we don't need the root node to be the first item in the collection anymore.

Next we need the "main" method which takes a List<TreeNode> as a parameter and returns a TreeNode which is the root node like so

public static TreeNode BuildTree(this List<TreeNode> nodes)
{
 var root = nodes.RemoveRootNode();
 if (root == null) { throw new ArgumentOutOfRangeException("nodes"); }
 return root.BuildTree(nodes);
}

(削除ここまで)

Edit

Based on the changed example data, we now can have multiple TreeNode with Parent == null which makes the RemoveRootNode() method superfluous and will result in TreeNode buildTree(this List<TreeNode>) like so

public static TreeNode BuildTree(this List<TreeNode> nodes)
{
 if (nodes == null)
 {
 throw new ArgumentNullException("nodes");
 }
 return new TreeNode().BuildTree(nodes);
}

As we see there should be a BuildTree(this TreeNode, List<TreeNode>) method which will look like:

private static TreeNode BuildTree(this TreeNode root, List<TreeNode> nodes)
{
 if (nodes.Count == 0) { return root; }
 var children = root.FetchChildren(nodes).ToList();
 root.Children.AddRange(children);
 root.RemoveChildren(nodes);
 for (int i = 0; i < children.Count; i++)
 {
 children[i] = children[i].BuildTree(nodes);
 if (nodes.Count == 0) { break; }
 }
 return root;
}

which is self-explanatory. It fetches the children by using

public static IEnumerable<TreeNode> FetchChildren(this TreeNode root, List<TreeNode> nodes)
{
 return nodes.Where(n => n.Parent == root.Id);
}

and add them to the Children property of the root node and removes them from the nodes by

public static void RemoveChildren(this TreeNode root, List<TreeNode> nodes)
{
 foreach (var node in root.Children)
 {
 nodes.Remove(node);
 }
}

Then it iterates over the the children and calls itself recursively.

JanDotNet JanDotNet 8,5682 gold badges21 silver badges48 bronze badges · Answer 2 · 2017-05-23 06:35:08Z

(Bug)

Edit: It seems that your incoming data are sorted. If so, this 'bug' is not valid of course

Your traversing algorithm works fine as long as the TreeNodes have the correct order - parents must be positioned before children. If you add a node that's parent is not already part of the tree, it will never be added.

To fix that bug, you need at least another reprocessing step that sorts your items...

Naming

Properties should start with a capital letter.

TreeNode

As mentioned in a comment, the ID property should not be nullable.
If possible, I would also make the Id property read-only and pass them in the constructor because it should not change after initialization.

Tree

Same as in TreeNode - Id-property should be read-only.
The children property should also be read-only and initialized in constructor. That avoids the error-prone initialization logic inside your TraverseAndAddNode method.

TraverseAndAddNode

The nodeAdded variable is actually not required:

Simplified version of the TraverseAndAddNode method containing the suggestions above:

public static bool TraverseAndAddNode(this Tree root, TreeNode node)
{
 // Check if the current root is the parent of the node to be added
 if (root.id == node.parent)
 {
 root.children.Add(new Tree()
 {
 id = node.id,
 text = node.text,
 children = null
 });
 return true;
 }
 // Note: You could use a one-liner here 
 //return root.Children.Any(child => child.TraverseAndAddNode(node))
 foreach (Tree tree in root.children)
 {
 if (tree.TraverseAndAddNode(node))
 {
 return true;
 }
 }
 return false;
}

Performance

The performance intensive step is the missing one that sorts the nodes. Therefore, I would think about an alternative approach instead of sorting + recursive creating the tree.

One alternative approach may be:

create non-connected tree items of all TreeNode objects
Indexing them by id using a dictionary
Iterate over the items, get the parent from the dictionary and put the current item in its child collection.

dfhwze dfhwze 14.1k3 gold badges40 silver badges101 bronze badges · Answer 3 · 2019-05-25 11:44:44Z

Review

Keep in mind I am using an older version of .NET so my semantics can be somehow verbose.

Navigation

You are lacking a reference back to the parent, which makes bottom-up navigation much harder. A tree that knows its parent allows for much more flexibility in navigation.

root.children.Add(new Tree()
{
 id = node.id,
 text = node.text,
 children = null // <- no parent reference ;-(
});

public class Tree
{
 public int? Id { get; set; }
 public string Text { get; set; }
 protected List<Tree> _children;
 protected Tree _parent; // <- added reference to parent :-)
}

Now, we can use TreeExtensions to nagivate a tree. The most basic tree traversal walkers are available. More could be added if required.

public static class TreeExtensions
{
 public static IEnumerable<Tree> Descendants(this Tree value) {
 // a descendant is the node self and any descendant of the children
 if (value == null) yield break;
 yield return value;
 // depth-first pre-order tree walker
 foreach (var child in value.Children)
 foreach (var descendantOfChild in child.Descendants()) {
 yield return descendantOfChild;
 }
 }
 public static IEnumerable<Tree> Ancestors(this Tree value) {
 // an ancestor is the node self and any ancestor of the parent
 var ancestor = value;
 // post-order tree walker
 while (ancestor != null) {
 yield return ancestor;
 ancestor = ancestor.Parent;
 }
 }
}

Lexicon

In computer science, many API's use the terminology below.

descendant: any child, grandchild, ..
descendantOrSelf: any descendant or the specified node itself
ancestor: any parent, grandparent, ..
ancestorOrSelf: any ancestor or the specified node itself

However, I tend to use family tree terminology.

successor: any child, grandchild, ..
descendant: any successor or the specified node itself
predecessor: any parent, grandparent, ..
ancestor: any predecessor or the specified node itself

Data Integrity

Encapsulating the family members _parent, _children provides us with the capability of preserving the tree's integrity.

public class Tree
{
 public int? id { get; set; }
 public string text { get; set; }
 public List<Tree> children { get; set; }
}

public class Tree
{
 public int? Id { get; set; }
 public string Text { get; set; }
 protected List<Tree> _children;
 protected Tree _parent;
 public Tree Parent { get { return _parent; } }
 public Tree Root { get { return _parent == null ? this : _parent.Root; } }
 public IEnumerable<Tree> Children {
 get { return _children == null 
 ? Enumerable.Empty<Tree>() : _children.ToArray(); 
 }
 }
}

We could then have Add and Remove support with sufficient validation support to maintain integrity.

public void Add(Tree child) {
 if (child == null)
 throw new ArgumentNullException();
 if (child._parent != null)
 throw new InvalidOperationException(
 "A tree node must be removed from its parent before adding as child.");
 if (this.Ancestors().Contains(child))
 throw new InvalidOperationException("A tree cannot be a cyclic graph.");
 if (_children == null) {
 _children = new List<Tree>();
 }
 Debug.Assert(!_children.Contains(child), "At this point, the node is definately not a child");
 child._parent = this;
 _children.Add(child);
}
public bool Remove(Tree child) {
 if (child == null)
 throw new ArgumentNullException();
 if (child._parent != this)
 return false;
 Debug.Assert(_children.Contains(child), "At this point, the node is definately a child");
 child._parent = null;
 _children.Remove(child);
 if (!_children.Any())
 _children = null;
 return true;
}

Builder

I am not convinced an extension method is the way to go for this method. I would rather make a Builder class for it, but that's probably a matter of taste.

public static Tree BuildTree(this List<TreeNode> nodes)
{
 // method impl ..
}

Determining the root node is not optimized. In one of your examples, there is a root in the input nodes. In another example, there is no root. If the former case, we could use that root as tree. In the latter, we could create the null-root.

// Create a NULL-root tree
Tree root = new Tree();

private static Tree FindTreeRoot(IList<TreeNode> nodes) {
 var rootNodes = nodes.Where(node => node.parent == null);
 if (rootNodes.Count() != 1) return new Tree();
 var rootNode = rootNodes.Single();
 nodes.Remove(rootNode);
 return Map(rootNode);
}
private static Tree Map(TreeNode node) {
 return new Tree {
 Id = node.id,
 Text = node.text,
 };
}

Building the tree could then become a recursive action. I am using a list, in order to remove the visited nodes from the ones that still need to be visited.

public static Tree BuildTree(IEnumerable<TreeNode> nodes) {
 if (nodes == null) return new Tree();
 var nodeList = nodes.ToList();
 var tree = FindTreeRoot(nodeList);
 BuildTree(tree, nodeList);
 return tree;
}
private static void BuildTree(Tree tree, IList<TreeNode> descendants) {
 // Note: could be optimized further for performance
 var children = descendants.Where(node => node.parent == tree.Id).ToArray();
 foreach (var child in children) {
 var branch = Map(child);
 tree.Add(branch);
 descendants.Remove(child);
 }
 foreach (var branch in tree.Children) {
 BuildTree(branch, descendants);
 }
}

Proposed Solution

This is an appendix of the classes used as an alternative/edited flow, with an example included.

public class Program
{
 public static void Main()
 {
 //0 0 NULL
 //1 1 0
 //11 11 1
 //2 2 0
 //21 21 2
 //211 211 21
 //22 22 2
 //3 3 0
 var nodes = new List<TreeNode> { 
 new TreeNode { id = 0, text = "0", parent = null },
 new TreeNode { id = 1, text = "1", parent = 0 },
 new TreeNode { id = 11, text = "11", parent = 1 },
 new TreeNode { id = 2, text = "2", parent = 0 },
 new TreeNode { id = 21, text = "21", parent = 2 },
 new TreeNode { id = 211, text = "211", parent = 21 },
 new TreeNode { id = 22, text = "22", parent = 2 },
 new TreeNode { id = 3, text = "3", parent = 0 }
 };
 var tree = TreeBuilder.BuildTree(nodes);
 Console.ReadKey();
 }
}
public class TreeNode
{
 public int id { get; set; }
 public string text { get; set; }
 public int? parent { get; set; }
}
public static class TreeBuilder
{
 public static Tree BuildTree(IEnumerable<TreeNode> nodes) {
 if (nodes == null) return new Tree();
 var nodeList = nodes.ToList();
 var tree = FindTreeRoot(nodeList);
 BuildTree(tree, nodeList);
 return tree;
 }
 private static void BuildTree(Tree tree, IList<TreeNode> descendants) {
 var children = descendants.Where(node => node.parent == tree.Id).ToArray();
 foreach (var child in children) {
 var branch = Map(child);
 tree.Add(branch);
 descendants.Remove(child);
 }
 foreach (var branch in tree.Children) {
 BuildTree(branch, descendants);
 }
 }
 private static Tree FindTreeRoot(IList<TreeNode> nodes) {
 var rootNodes = nodes.Where(node => node.parent == null);
 if (rootNodes.Count() != 1) return new Tree();
 var rootNode = rootNodes.Single();
 nodes.Remove(rootNode);
 return Map(rootNode);
 }
 private static Tree Map(TreeNode node) {
 return new Tree {
 Id = node.id,
 Text = node.text,
 };
 }
}
public static class TreeExtensions
{
 public static IEnumerable<Tree> Descendants(this Tree value) {
 // a descendant is the node self and any descendant of the children
 if (value == null) yield break;
 yield return value;
 // depth-first pre-order tree walker
 foreach (var child in value.Children)
 foreach (var descendantOfChild in child.Descendants()) {
 yield return descendantOfChild;
 }
 }
 public static IEnumerable<Tree> Ancestors(this Tree value) {
 // an ancestor is the node self and any ancestor of the parent
 var ancestor = value;
 // post-order tree walker
 while (ancestor != null) {
 yield return ancestor;
 ancestor = ancestor.Parent;
 }
 }
}
public class Tree
{
 public int? Id { get; set; }
 public string Text { get; set; }
 protected List<Tree> _children;
 protected Tree _parent;
 public Tree() {
 Text = string.Empty;
 }
 public Tree Parent { get { return _parent; } }
 public Tree Root { get { return _parent == null ? this : _parent.Root; } }
 public int Depth { get { return this.Ancestors().Count() - 1; } }
 public IEnumerable<Tree> Children {
 get { return _children == null ? Enumerable.Empty<Tree>() : _children.ToArray(); }
 }
 public override string ToString() {
 return Text;
 }
 public void Add(Tree child) {
 if (child == null)
 throw new ArgumentNullException();
 if (child._parent != null)
 throw new InvalidOperationException("A tree node must be removed from its parent before adding as child.");
 if (this.Ancestors().Contains(child))
 throw new InvalidOperationException("A tree cannot be a cyclic graph.");
 if (_children == null) {
 _children = new List<Tree>();
 }
 Debug.Assert(!_children.Contains(child), "At this point, the node is definately not a child");
 child._parent = this;
 _children.Add(child);
 }
 public bool Remove(Tree child) {
 if (child == null)
 throw new ArgumentNullException();
 if (child._parent != this)
 return false;
 Debug.Assert(_children.Contains(child), "At this point, the node is definately a child");
 child._parent = null;
 _children.Remove(child);
 if (!_children.Any())
 _children = null;
 return true;
 }

I was build something very similar some time ago where I used attributes and a dictionary as a lookup for creating the tree. I'm addicted to reusability ;-]
Thank you so much for this, especially the point about the parent reference. It helps to see things from a new perspective, from another developer's point of view.

Charlie E Charlie E 111 bronze badge · Answer 4 · 2020-12-22 17:49:17Z

1

\$\begingroup\$

This method is okay, other than the fact you're null checking an int and it should just be 0 for a root node. Not all versions of C# support nullable types and int nullable types are usually frowned upon.

Share

edited Dec 23, 2020 at 8:00

Toby Speight's user avatar

Toby Speight

87.9k14 gold badges104 silver badges325 bronze badges

answered Dec 22, 2020 at 17:49

Charlie E's user avatar

Charlie E Charlie E

111 bronze badge

\$\endgroup\$

Add a comment |

Stack Exchange Network

Building a Tree from a flat List<Nodes>

4 Answers 4

Another approach

Edit

Review

Navigation

Lexicon

Data Integrity

Builder

Proposed Solution

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Building a Tree from a flat List<Nodes>

4 Answers 4

Another approach

Edit

Review

Navigation

Lexicon

Data Integrity

Builder

Proposed Solution

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions