Generic tree data structure for flat data from a database

Question 1

Could someone please review my code for performance(any other suggestions are welcome) which converts flat data list coming from database to a tree?

Interface for db entity class

public interface IDbEntityNode
{
 int Id { get; set; }
 int ParentId { get; set; }
 string Data { get; set; } 
}

Example of db Entity class

public class ExceptionCategory :IDbEntityNode
{
 public int Id { get; set; }
 public int ParentId { get; set; }
 public string Data { get; set; } 
 public ExceptionCategory(string data, int id, int parentId)
 {
 Id = id;
 ParentId = parentId;
 Data = data;
 }
}

Generic class which holds the structure of tree node

public class GenericNode<T> 
{
 public T NodeInformation { get; set; }
 public GenericNode<T> Parent { get; set; }
 public List<GenericNode<T>> Children { get; set; } = new List<GenericNode<T>>();
}

Method which coverts flat list to tree

public static List<GenericNode<T>> CreateGenericTree<T>(List<T> flatDataObject,Func<T,bool> IsRootNode) where T : IDbEntityNode 
{
 var lookup = new Dictionary<int, GenericNode<T>>();
 var rootNodes = new List<GenericNode<T>>();
 var noOfElements = flatDataObject.Count;
 for (int element = 0; element < noOfElements; element++)
 {
 GenericNode<T> currentNode;
 if (lookup.TryGetValue(flatDataObject[element].Id, out currentNode))
 {
 currentNode.NodeInformation = flatDataObject[element];
 }
 else
 {
 currentNode = new GenericNode<T>() { NodeInformation = flatDataObject[element] };
 lookup.Add(flatDataObject[element].Id, currentNode);
 }
 if (IsRootNode(flatDataObject[element])) 
 {
 rootNodes.Add(currentNode);
 }
 else
 {
 GenericNode<T> parentNode;
 if (!lookup.TryGetValue(flatDataObject[element].ParentId, out parentNode))
 { 
 parentNode = new GenericNode<T>();
 lookup.Add(flatDataObject[element].ParentId, parentNode);
 }
 parentNode.Children.Add(currentNode);
 currentNode.Parent = parentNode;
 }
 }
 return rootNodes;
}

Execution:

private static void Main(string[] args)
{
 List<IDbEntityNode> flatDataStructure = new List<IDbEntityNode>
 {
 new ExceptionCategory("System Exception",1,0),
 new ExceptionCategory("Index out of range",2,1),
 new ExceptionCategory("Null Reference",3,1),
 new ExceptionCategory("Invalid Cast",4,1),
 new ExceptionCategory("OOM",5,1),
 new ExceptionCategory("Argument Exception",6,1),
 new ExceptionCategory("Argument Out Of Range",7,6),
 new ExceptionCategory("Argument Null",8,6),
 new ExceptionCategory("External Exception",9,1),
 new ExceptionCategory("Com",10,9),
 new ExceptionCategory("SEH",11,9),
 new ExceptionCategory("Arithmatic Exception",12,1),
 new ExceptionCategory("DivideBy0",13,12),
 new ExceptionCategory("Overflow",14,12),
 };
 var tree = CreateGenericTree(flatDataStructure, IsRootNode);
}

Root node has ParentId set to 0

private static bool IsRootNode(IDbEntityNode dbEntity)
{
 bool isRootNode = false;
 if (dbEntity.ParentId == 0 )
 isRootNode = true;
 return isRootNode; 
}

Question 2

You're saying that you're Trying to implement generic tree data structure - does that mean it doesn't work yet? Could you clarify this please?

Question 3

It works...But is there any scope for improvement in my code...

Question 4

ok, great. One more thing. IsRootNode is missing. I'd be great if you could add the missing parts.

Question 5

Added IsRootNode method...I made it func because i am not sure for other dbEntity the criteria would be the same to detect IsRootNode

Question 6

@A.Learn: Aren't you missing an ExceptionCategory with Id == 0 in your example data or is your IsRootNode() wrong?

Question 7

`GenericNode<T>`

I'll adopt the name Node<T>, as Henrik Hansen has done.

This class is completely publically mutable, which is probably not ideal. Does it make any sense to change the NodeInformation once you've created the node? Maybe it does, but if not, then you should work to enforce the idea by making it immutable. I'd also wager it makes no sense to have a node without this information, so I'd add a constructor to this effect:

 public T NodeInformation { get; }
 public Node(T nodeInformation)
 {
 NodeInformation = nodeInformation;
 }

The children and parent properties are more tricky, as (without some effort), you can't know who who the children are upfront, and it's even more effort to make both the Children list and Parent immutable. Instead, let's make them privatelly mutable, and add an AddChild method, so that it's more difficult for an external entity to produce an invalid tree:
```
public Node<T> Parent { get; private set; }
private readonly List<Node<T>> _children = new List<Node<T>>();
public IReadOnlyList<Node<T>> Children => _children;
public void AddChild(Node<T> childNode)
{
 if (childNode.Parent != null)
 throw new ArgumentException(nameof(childNode), "Child node already has a parent");
 _children.Add(childNode);
 childNode.Parent = this;
}
```
Note that we have a List<Node<T>> so we can add children, but we only expose an IReadOnlyList<Node<T>>, so they can only be added (without dodgy casts) by means of the AddChild method.

`IDbEntityNode`

Again, everything here is publically mutable: I do not think it makes much sense to change the Id of something! As a general rule, immutable is good if you can have it. It's very easy to add a public setter back in, but it's potentially nightmarish to take it out.
You mentioned wanted more generic IDs: I'd be inclined to make this type generic, with both generic Ids and generic Data. You can always provide a 'convenience' concrete version which has int and string baked in. If you have generic Ids, you either need to make the type comparable, or you'll want to pass CreateGenericTree an IEqualityComparer<TId> to handle it (or you could assume the default comparer, but that's much less fun, and you can again provide a 'convenience' overload which uses it by default).

`CreateGenericTree`

Conceptually, this appears to create multiple trees, as oppose to one tree (as there can be many roots); should it be called CreateTrees?
You are repeatedly indexing into flatDataObject: much better to loop over something like i, which everyone knows is an index, and set element = flatDataObject[i] (or use a foreach as Henrik Hansen has said).
I think I'm in a minority, but I'd be strongly inclined to use a dedicated delegate type rather than Func<T, bool>. What question does Func<T, bool> answer? It is a horse? Is it lazily loaded? I don't know. You might consider then something like, but a lot of people dislike nominal delegates, for various reasons.
```
public delegate bool RootDetector<in T>(T node) where T : IDbEntityNode
```
You are returning a List<T>, and there is a good chance that you could just as well return a more abstract type, such as IList<T> or IReadOnlyList<T>. This give you freedom to change the actual returned type later (e.g. if you change the implementation) without having to change the return type, and if it turns out later that you need some members of a less abstract type, then you can more easily change an abstract return type to less abstract return type than you can the other way round.
You are also taking List<T> as a parameter. As someone consuming your API, this would concern me, as List<T> is openly mutable, and I would unsure whether your code is going to modify what I pass it. Much better to use an abstract type like IReadOnlyList<T>, which gives the caller more freedom, and communicates immediately that you are not going to modify the parameter.
Your code will currently heave oddly if the flatDataObject contains nodes with the same Id. Unfortunately, checking for this would be ugly.

Below is my take (you'll note it's pretty similar to Henrik Hansen's code).

public static IReadOnlyList<Node<T>> CreateTrees<T>(IEnumerable<T> flatDataObjects, RootDetector<T> isRoot) where T : IDbEntityNode 
{
 if (flatDataObjects == null)
 throw new ArgumentNullException(nameof(flatDataObjects));
 if (isRoot == null)
 throw new ArgumentNullException(nameof(isRoot));
 var nodes = flatDataObjects.ToDictionary(fdo => fdo.Id, fdo => new Node<T>(fdo));
 List<Node<T>> roots = new List<Node<T>>();
 foreach (var node in nodes.Values)
 {
 if (isRoot(node.NodeInformation))
 {
 roots.Add(node);
 }
 else
 {
 if (nodes.TryGetValue(node.NodeInformation.ParentId, out var parentNode))
 {
 parentNode.AddChild(node);
 }
 else
 {
 throw new InvalidOperationException($"Non-root node {node.NodeInformation.Id} has ParentId {node.NodeInformation.ParentId}, but no node exists with that id");
 }
 }
 }
 return roots;
}

Note that I've added dedicated checks for null parameters, and that upon detecting an invalid input (a node with no parent) I throw a helpful error message. Note also that ToDictionary will throw if two elements have the same key (Which is good), though the message will be less helpful: you could actively trap this exception and throw your own explanation, or otherwise you could manually detected duplicates and throw first/write your own version of ToDictionary to achieve the same.

I would argue that these LINQy versions are much more readable, because each bit of logic is mostly self contained. Even clearer would be to select the roots only at the very end, but there is a good case to be made that calling isRoot as few times as possible is desirable, as we don't know what it does, and partitioning the dictionary would introduce complexity.

Boring things

IsRootNode is a method parameter, and usually these are in lowerCamelCase (e.g. isRootNode). Note that it has the same name currently as the method you are passing to it, which means if you change it to isRootNode you might forget to change the usage if the method is in scope, and then you have a problem (C# warns about unused variables, but not unused parameters).
Pay attention to your white-space: inconsistent white-space can make perfectly good code terribly untidy-looking, and a few more line-breaks here-and-there really help to break up the logic and make the code easier to scan.

Question 8

I took all your comments and implemented them, code is much better than before and also it did not compromise with speed when i did profiling between earlier and later(after implementing comments) code

Question 9

All in all it looks pretty good.

I have the following remarks:

The name GenericNode<T> is somewhat redundant or "Pleonastic". I would simply call it Node<T> because the type argument indicates the genericness.

GenericNode<T>.NodeInformation would I simply call Value

The IsRootNode() can be reduced to:

private static bool IsRootNode(IDbEntityNode dbEntity)
{
 return dbEntity.ParentId == 0;
}

CreateTree<T> can be changed to:

public static List<Node<T>> CreateTree<T>(List<T> flatDataObject, Func<T, bool> IsRootNode) where T : IDbEntityNode
{
 var lookup = new Dictionary<int, Node<T>>();
 var rootNodes = new List<Node<T>>();
 foreach (T element in flatDataObject)
 {
 if (lookup.TryGetValue(element.Id, out Node<T> currentNode))
 {
 currentNode.Value = element;
 }
 else
 {
 currentNode = new Node<T>() { Value = element };
 lookup.Add(element.Id, currentNode);
 }
 if (IsRootNode(element))
 {
 rootNodes.Add(currentNode);
 }
 else
 { 
 if (!lookup.TryGetValue(element.ParentId, out Node<T> parentNode))
 {
 parentNode = new Node<T>();
 lookup.Add(element.ParentId, parentNode);
 }
 parentNode.Children.Add(currentNode);
 currentNode.Parent = parentNode;
 }
 }
 return rootNodes;
}

Here the for-loop is changed to a foreach-loop which makes it a little more readable.

The same thing could be done using LINQ, but be aware that LINQ is not necessarily especially performant:

public static List<Node<T>> CreateTree<T>(List<T> flatDataObject, Func<T, bool> IsRootNode) where T : IDbEntityNode
{
 var roots = flatDataObject.Where(o => IsRootNode(o)).Select(o => new Node<T> { Value = o, Parent = null }).ToList();
 var currentParents = roots;
 while (currentParents.Any())
 {
 currentParents = currentParents.SelectMany(p =>
 {
 var children = flatDataObject.Where(o => o.ParentId == p.Value.Id).Select(o => new Node<T> { Value = o, Parent = p }).ToArray();
 p.Children.AddRange(children);
 return children;
 }).ToList();
 }
 return roots;
}

and here's another LINQ approach:

public static List<Node<T>> CreateTree<T>(List<T> flatDataObject, Func<T, bool> IsRootNode) where T : IDbEntityNode
{
 var nodes = flatDataObject.Select(o => new Node<T> { Value = o }).ToArray();
 List<Node<T>> roots = null;
 foreach (var group in nodes.GroupBy(n => n.Value.ParentId))
 {
 var parent = nodes.FirstOrDefault(n => n.Value.Id == group.Key);
 if (parent != null)
 {
 parent.Children.AddRange(group);
 foreach (var node in group)
 {
 node.Parent = parent;
 }
 }
 else
 {
 roots = group.ToList();
 }
 }
 return roots; //nodes.Where(n => IsRootNode(n.Value)).ToList();
}

Question 10

Couple of questions/comments about the (much more readable) LINQ versions: is there a reason you use an array and linear search rather than a dictionary for nodes? I'd prefer root was called parent, hopefully for obvious reasons ;) The (new) method you have for finding roots assumes that all roots have the same parent, but I think the OP wants it to be agnostic to the behaviour IsRootNode (the old Where at the end looked fine to me, and was very easy to understand)

Question 11

@VisualMelon: parent it should be. The roots variable was added as a suggestion for optimization, but I kept the Where in the comment because I had the same discussion as you. Isn't there is a problem with IsRootNode: If IsRootNode(node) == false and the node doesn't have any parent among the other nodes - what is it then? (In the example, if there was a node with Node.ParentId == 42)

Question 12

@VisualMelon: No reason for Array - a HashSet or Dictionary may be a better choice.

Question 13

@HenrikHansen Thanks for taking your time for review, but i can't move to Linq as it will go over my SLA limit...I profiled the Linq version you gave and it was 350% slower than the code pasted in question but I agree with your suggestion with naming conventions.

Question 14

@HenrikHansen my concern with the root logic (in the last example) is that it assumes that root nodes are indicated by a single ParentId, rather than using the IsRootNode parameter (which unfortunately has the same name as the static method implementation). It's not in the question body, but the OP comments "I made it Func because I am not sure for other dbEntity the criteria would be the same to detect IsRootNode".

VisualMelon VisualMelon 7,57123 silver badges46 bronze badges · Accepted Answer · 2018-09-19 22:43:17Z

`GenericNode<T>`

I'll adopt the name Node<T>, as Henrik Hansen has done.

This class is completely publically mutable, which is probably not ideal. Does it make any sense to change the NodeInformation once you've created the node? Maybe it does, but if not, then you should work to enforce the idea by making it immutable. I'd also wager it makes no sense to have a node without this information, so I'd add a constructor to this effect:

 public T NodeInformation { get; }
 public Node(T nodeInformation)
 {
 NodeInformation = nodeInformation;
 }

The children and parent properties are more tricky, as (without some effort), you can't know who who the children are upfront, and it's even more effort to make both the Children list and Parent immutable. Instead, let's make them privatelly mutable, and add an AddChild method, so that it's more difficult for an external entity to produce an invalid tree:
```
public Node<T> Parent { get; private set; }
private readonly List<Node<T>> _children = new List<Node<T>>();
public IReadOnlyList<Node<T>> Children => _children;
public void AddChild(Node<T> childNode)
{
 if (childNode.Parent != null)
 throw new ArgumentException(nameof(childNode), "Child node already has a parent");
 _children.Add(childNode);
 childNode.Parent = this;
}
```
Note that we have a List<Node<T>> so we can add children, but we only expose an IReadOnlyList<Node<T>>, so they can only be added (without dodgy casts) by means of the AddChild method.

`IDbEntityNode`

Again, everything here is publically mutable: I do not think it makes much sense to change the Id of something! As a general rule, immutable is good if you can have it. It's very easy to add a public setter back in, but it's potentially nightmarish to take it out.
You mentioned wanted more generic IDs: I'd be inclined to make this type generic, with both generic Ids and generic Data. You can always provide a 'convenience' concrete version which has int and string baked in. If you have generic Ids, you either need to make the type comparable, or you'll want to pass CreateGenericTree an IEqualityComparer<TId> to handle it (or you could assume the default comparer, but that's much less fun, and you can again provide a 'convenience' overload which uses it by default).

`CreateGenericTree`

Conceptually, this appears to create multiple trees, as oppose to one tree (as there can be many roots); should it be called CreateTrees?
You are repeatedly indexing into flatDataObject: much better to loop over something like i, which everyone knows is an index, and set element = flatDataObject[i] (or use a foreach as Henrik Hansen has said).
I think I'm in a minority, but I'd be strongly inclined to use a dedicated delegate type rather than Func<T, bool>. What question does Func<T, bool> answer? It is a horse? Is it lazily loaded? I don't know. You might consider then something like, but a lot of people dislike nominal delegates, for various reasons.
```
public delegate bool RootDetector<in T>(T node) where T : IDbEntityNode
```
You are returning a List<T>, and there is a good chance that you could just as well return a more abstract type, such as IList<T> or IReadOnlyList<T>. This give you freedom to change the actual returned type later (e.g. if you change the implementation) without having to change the return type, and if it turns out later that you need some members of a less abstract type, then you can more easily change an abstract return type to less abstract return type than you can the other way round.
You are also taking List<T> as a parameter. As someone consuming your API, this would concern me, as List<T> is openly mutable, and I would unsure whether your code is going to modify what I pass it. Much better to use an abstract type like IReadOnlyList<T>, which gives the caller more freedom, and communicates immediately that you are not going to modify the parameter.
Your code will currently heave oddly if the flatDataObject contains nodes with the same Id. Unfortunately, checking for this would be ugly.

Below is my take (you'll note it's pretty similar to Henrik Hansen's code).

public static IReadOnlyList<Node<T>> CreateTrees<T>(IEnumerable<T> flatDataObjects, RootDetector<T> isRoot) where T : IDbEntityNode 
{
 if (flatDataObjects == null)
 throw new ArgumentNullException(nameof(flatDataObjects));
 if (isRoot == null)
 throw new ArgumentNullException(nameof(isRoot));
 var nodes = flatDataObjects.ToDictionary(fdo => fdo.Id, fdo => new Node<T>(fdo));
 List<Node<T>> roots = new List<Node<T>>();
 foreach (var node in nodes.Values)
 {
 if (isRoot(node.NodeInformation))
 {
 roots.Add(node);
 }
 else
 {
 if (nodes.TryGetValue(node.NodeInformation.ParentId, out var parentNode))
 {
 parentNode.AddChild(node);
 }
 else
 {
 throw new InvalidOperationException($"Non-root node {node.NodeInformation.Id} has ParentId {node.NodeInformation.ParentId}, but no node exists with that id");
 }
 }
 }
 return roots;
}

Note that I've added dedicated checks for null parameters, and that upon detecting an invalid input (a node with no parent) I throw a helpful error message. Note also that ToDictionary will throw if two elements have the same key (Which is good), though the message will be less helpful: you could actively trap this exception and throw your own explanation, or otherwise you could manually detected duplicates and throw first/write your own version of ToDictionary to achieve the same.

I would argue that these LINQy versions are much more readable, because each bit of logic is mostly self contained. Even clearer would be to select the roots only at the very end, but there is a good case to be made that calling isRoot as few times as possible is desirable, as we don't know what it does, and partitioning the dictionary would introduce complexity.

Boring things

IsRootNode is a method parameter, and usually these are in lowerCamelCase (e.g. isRootNode). Note that it has the same name currently as the method you are passing to it, which means if you change it to isRootNode you might forget to change the usage if the method is in scope, and then you have a problem (C# warns about unused variables, but not unused parameters).
Pay attention to your white-space: inconsistent white-space can make perfectly good code terribly untidy-looking, and a few more line-breaks here-and-there really help to break up the logic and make the code easier to scan.

I took all your comments and implemented them, code is much better than before and also it did not compromise with speed when i did profiling between earlier and later(after implementing comments) code

Stack Exchange Network

Generic tree data structure for flat data from a database

2 Answers 2

`GenericNode<T>`

`IDbEntityNode`

`CreateGenericTree`

Boring things

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Generic tree data structure for flat data from a database

2 Answers 2

GenericNode<T>

IDbEntityNode

CreateGenericTree

Boring things

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`GenericNode<T>`

`IDbEntityNode`

`CreateGenericTree`