Could someone please review my code for performance(any other suggestions are welcome) which converts flat data list coming from database to a tree?
Interface for db entity class
public interface IDbEntityNode
{
int Id { get; set; }
int ParentId { get; set; }
string Data { get; set; }
}
Example of db Entity class
public class ExceptionCategory :IDbEntityNode
{
public int Id { get; set; }
public int ParentId { get; set; }
public string Data { get; set; }
public ExceptionCategory(string data, int id, int parentId)
{
Id = id;
ParentId = parentId;
Data = data;
}
}
Generic class which holds the structure of tree node
public class GenericNode<T>
{
public T NodeInformation { get; set; }
public GenericNode<T> Parent { get; set; }
public List<GenericNode<T>> Children { get; set; } = new List<GenericNode<T>>();
}
Method which coverts flat list to tree
public static List<GenericNode<T>> CreateGenericTree<T>(List<T> flatDataObject,Func<T,bool> IsRootNode) where T : IDbEntityNode
{
var lookup = new Dictionary<int, GenericNode<T>>();
var rootNodes = new List<GenericNode<T>>();
var noOfElements = flatDataObject.Count;
for (int element = 0; element < noOfElements; element++)
{
GenericNode<T> currentNode;
if (lookup.TryGetValue(flatDataObject[element].Id, out currentNode))
{
currentNode.NodeInformation = flatDataObject[element];
}
else
{
currentNode = new GenericNode<T>() { NodeInformation = flatDataObject[element] };
lookup.Add(flatDataObject[element].Id, currentNode);
}
if (IsRootNode(flatDataObject[element]))
{
rootNodes.Add(currentNode);
}
else
{
GenericNode<T> parentNode;
if (!lookup.TryGetValue(flatDataObject[element].ParentId, out parentNode))
{
parentNode = new GenericNode<T>();
lookup.Add(flatDataObject[element].ParentId, parentNode);
}
parentNode.Children.Add(currentNode);
currentNode.Parent = parentNode;
}
}
return rootNodes;
}
Execution:
private static void Main(string[] args)
{
List<IDbEntityNode> flatDataStructure = new List<IDbEntityNode>
{
new ExceptionCategory("System Exception",1,0),
new ExceptionCategory("Index out of range",2,1),
new ExceptionCategory("Null Reference",3,1),
new ExceptionCategory("Invalid Cast",4,1),
new ExceptionCategory("OOM",5,1),
new ExceptionCategory("Argument Exception",6,1),
new ExceptionCategory("Argument Out Of Range",7,6),
new ExceptionCategory("Argument Null",8,6),
new ExceptionCategory("External Exception",9,1),
new ExceptionCategory("Com",10,9),
new ExceptionCategory("SEH",11,9),
new ExceptionCategory("Arithmatic Exception",12,1),
new ExceptionCategory("DivideBy0",13,12),
new ExceptionCategory("Overflow",14,12),
};
var tree = CreateGenericTree(flatDataStructure, IsRootNode);
}
Root node has ParentId set to 0
private static bool IsRootNode(IDbEntityNode dbEntity)
{
bool isRootNode = false;
if (dbEntity.ParentId == 0 )
isRootNode = true;
return isRootNode;
}
2 Answers 2
GenericNode<T>
I'll adopt the name Node<T>
, as Henrik Hansen has done.
- This class is completely publically mutable, which is probably not ideal. Does it make any sense to change the
NodeInformation
once you've created the node? Maybe it does, but if not, then you should work to enforce the idea by making it immutable. I'd also wager it makes no sense to have a node without this information, so I'd add a constructor to this effect:
public T NodeInformation { get; }
public Node(T nodeInformation)
{
NodeInformation = nodeInformation;
}
The children and parent properties are more tricky, as (without some effort), you can't know who who the children are upfront, and it's even more effort to make both the
Children
list andParent
immutable. Instead, let's make them privatelly mutable, and add anAddChild
method, so that it's more difficult for an external entity to produce an invalid tree:public Node<T> Parent { get; private set; } private readonly List<Node<T>> _children = new List<Node<T>>(); public IReadOnlyList<Node<T>> Children => _children; public void AddChild(Node<T> childNode) { if (childNode.Parent != null) throw new ArgumentException(nameof(childNode), "Child node already has a parent"); _children.Add(childNode); childNode.Parent = this; }
Note that we have a
List<Node<T>>
so we can add children, but we only expose anIReadOnlyList<Node<T>>
, so they can only be added (without dodgy casts) by means of theAddChild
method.
IDbEntityNode
Again, everything here is publically mutable: I do not think it makes much sense to change the Id of something! As a general rule, immutable is good if you can have it. It's very easy to add a public setter back in, but it's potentially nightmarish to take it out.
You mentioned wanted more generic IDs: I'd be inclined to make this type generic, with both generic Ids and generic
Data
. You can always provide a 'convenience' concrete version which hasint
andstring
baked in. If you have generic Ids, you either need to make the type comparable, or you'll want to passCreateGenericTree
anIEqualityComparer<TId>
to handle it (or you could assume the default comparer, but that's much less fun, and you can again provide a 'convenience' overload which uses it by default).
CreateGenericTree
Conceptually, this appears to create multiple trees, as oppose to one tree (as there can be many roots); should it be called
CreateTrees
?You are repeatedly indexing into
flatDataObject
: much better to loop over something likei
, which everyone knows is an index, and setelement = flatDataObject[i]
(or use aforeach
as Henrik Hansen has said).I think I'm in a minority, but I'd be strongly inclined to use a dedicated delegate type rather than
Func<T, bool>
. What question doesFunc<T, bool>
answer? It is a horse? Is it lazily loaded? I don't know. You might consider then something like, but a lot of people dislike nominal delegates, for various reasons.public delegate bool RootDetector<in T>(T node) where T : IDbEntityNode
You are returning a
List<T>
, and there is a good chance that you could just as well return a more abstract type, such asIList<T>
orIReadOnlyList<T>
. This give you freedom to change the actual returned type later (e.g. if you change the implementation) without having to change the return type, and if it turns out later that you need some members of a less abstract type, then you can more easily change an abstract return type to less abstract return type than you can the other way round.You are also taking
List<T>
as a parameter. As someone consuming your API, this would concern me, asList<T>
is openly mutable, and I would unsure whether your code is going to modify what I pass it. Much better to use an abstract type likeIReadOnlyList<T>
, which gives the caller more freedom, and communicates immediately that you are not going to modify the parameter.Your code will currently heave oddly if the
flatDataObject
contains nodes with the sameId
. Unfortunately, checking for this would be ugly.
Below is my take (you'll note it's pretty similar to Henrik Hansen's code).
public static IReadOnlyList<Node<T>> CreateTrees<T>(IEnumerable<T> flatDataObjects, RootDetector<T> isRoot) where T : IDbEntityNode
{
if (flatDataObjects == null)
throw new ArgumentNullException(nameof(flatDataObjects));
if (isRoot == null)
throw new ArgumentNullException(nameof(isRoot));
var nodes = flatDataObjects.ToDictionary(fdo => fdo.Id, fdo => new Node<T>(fdo));
List<Node<T>> roots = new List<Node<T>>();
foreach (var node in nodes.Values)
{
if (isRoot(node.NodeInformation))
{
roots.Add(node);
}
else
{
if (nodes.TryGetValue(node.NodeInformation.ParentId, out var parentNode))
{
parentNode.AddChild(node);
}
else
{
throw new InvalidOperationException($"Non-root node {node.NodeInformation.Id} has ParentId {node.NodeInformation.ParentId}, but no node exists with that id");
}
}
}
return roots;
}
Note that I've added dedicated checks for null
parameters, and that upon detecting an invalid input (a node with no parent) I throw a helpful error message. Note also that ToDictionary
will throw if two elements have the same key (Which is good), though the message will be less helpful: you could actively trap this exception and throw your own explanation, or otherwise you could manually detected duplicates and throw first/write your own version of ToDictionary
to achieve the same.
I would argue that these LINQy versions are much more readable, because each bit of logic is mostly self contained. Even clearer would be to select the roots only at the very end, but there is a good case to be made that calling isRoot
as few times as possible is desirable, as we don't know what it does, and partitioning the dictionary would introduce complexity.
Boring things
IsRootNode
is a method parameter, and usually these are inlowerCamelCase
(e.g.isRootNode
). Note that it has the same name currently as the method you are passing to it, which means if you change it toisRootNode
you might forget to change the usage if the method is in scope, and then you have a problem (C# warns about unused variables, but not unused parameters).Pay attention to your white-space: inconsistent white-space can make perfectly good code terribly untidy-looking, and a few more line-breaks here-and-there really help to break up the logic and make the code easier to scan.
-
\$\begingroup\$ I took all your comments and implemented them, code is much better than before and also it did not compromise with speed when i did profiling between earlier and later(after implementing comments) code \$\endgroup\$A.Learn– A.Learn2018年09月20日 06:21:36 +00:00Commented Sep 20, 2018 at 6:21
All in all it looks pretty good.
I have the following remarks:
The name GenericNode<T>
is somewhat redundant or "Pleonastic". I would simply call it Node<T>
because the type argument indicates the genericness.
GenericNode<T>.NodeInformation
would I simply call Value
The IsRootNode()
can be reduced to:
private static bool IsRootNode(IDbEntityNode dbEntity)
{
return dbEntity.ParentId == 0;
}
CreateTree<T>
can be changed to:
public static List<Node<T>> CreateTree<T>(List<T> flatDataObject, Func<T, bool> IsRootNode) where T : IDbEntityNode
{
var lookup = new Dictionary<int, Node<T>>();
var rootNodes = new List<Node<T>>();
foreach (T element in flatDataObject)
{
if (lookup.TryGetValue(element.Id, out Node<T> currentNode))
{
currentNode.Value = element;
}
else
{
currentNode = new Node<T>() { Value = element };
lookup.Add(element.Id, currentNode);
}
if (IsRootNode(element))
{
rootNodes.Add(currentNode);
}
else
{
if (!lookup.TryGetValue(element.ParentId, out Node<T> parentNode))
{
parentNode = new Node<T>();
lookup.Add(element.ParentId, parentNode);
}
parentNode.Children.Add(currentNode);
currentNode.Parent = parentNode;
}
}
return rootNodes;
}
Here the for
-loop is changed to a foreach
-loop which makes it a little more readable.
The same thing could be done using LINQ, but be aware that LINQ is not necessarily especially performant:
public static List<Node<T>> CreateTree<T>(List<T> flatDataObject, Func<T, bool> IsRootNode) where T : IDbEntityNode
{
var roots = flatDataObject.Where(o => IsRootNode(o)).Select(o => new Node<T> { Value = o, Parent = null }).ToList();
var currentParents = roots;
while (currentParents.Any())
{
currentParents = currentParents.SelectMany(p =>
{
var children = flatDataObject.Where(o => o.ParentId == p.Value.Id).Select(o => new Node<T> { Value = o, Parent = p }).ToArray();
p.Children.AddRange(children);
return children;
}).ToList();
}
return roots;
}
and here's another LINQ approach:
public static List<Node<T>> CreateTree<T>(List<T> flatDataObject, Func<T, bool> IsRootNode) where T : IDbEntityNode
{
var nodes = flatDataObject.Select(o => new Node<T> { Value = o }).ToArray();
List<Node<T>> roots = null;
foreach (var group in nodes.GroupBy(n => n.Value.ParentId))
{
var parent = nodes.FirstOrDefault(n => n.Value.Id == group.Key);
if (parent != null)
{
parent.Children.AddRange(group);
foreach (var node in group)
{
node.Parent = parent;
}
}
else
{
roots = group.ToList();
}
}
return roots; //nodes.Where(n => IsRootNode(n.Value)).ToList();
}
-
\$\begingroup\$ Couple of questions/comments about the (much more readable) LINQ versions: is there a reason you use an array and linear search rather than a dictionary for
nodes
? I'd preferroot
was calledparent
, hopefully for obvious reasons ;) The (new) method you have for finding roots assumes that all roots have the same parent, but I think the OP wants it to be agnostic to the behaviourIsRootNode
(the oldWhere
at the end looked fine to me, and was very easy to understand) \$\endgroup\$VisualMelon– VisualMelon2018年09月19日 21:34:03 +00:00Commented Sep 19, 2018 at 21:34 -
\$\begingroup\$ @VisualMelon:
parent
it should be. Theroots
variable was added as a suggestion for optimization, but I kept theWhere
in the comment because I had the same discussion as you. Isn't there is a problem withIsRootNode
: IfIsRootNode(node) == false
and thenode
doesn't have any parent among the other nodes - what is it then? (In the example, if there was a node withNode.ParentId == 42
) \$\endgroup\$user73941– user739412018年09月20日 04:58:38 +00:00Commented Sep 20, 2018 at 4:58 -
\$\begingroup\$ @VisualMelon: No reason for Array - a HashSet or Dictionary may be a better choice. \$\endgroup\$user73941– user739412018年09月20日 05:13:41 +00:00Commented Sep 20, 2018 at 5:13
-
1\$\begingroup\$ @HenrikHansen Thanks for taking your time for review, but i can't move to Linq as it will go over my SLA limit...I profiled the Linq version you gave and it was 350% slower than the code pasted in question but I agree with your suggestion with naming conventions. \$\endgroup\$A.Learn– A.Learn2018年09月20日 06:25:12 +00:00Commented Sep 20, 2018 at 6:25
-
\$\begingroup\$ @HenrikHansen my concern with the root logic (in the last example) is that it assumes that root nodes are indicated by a single
ParentId
, rather than using theIsRootNode
parameter (which unfortunately has the same name as the static method implementation). It's not in the question body, but the OP comments "I made itFunc
because I am not sure for other dbEntity the criteria would be the same to detectIsRootNode
". \$\endgroup\$VisualMelon– VisualMelon2018年09月20日 11:01:02 +00:00Commented Sep 20, 2018 at 11:01
IsRootNode
is missing. I'd be great if you could add the missing parts. \$\endgroup\$ExceptionCategory
withId == 0
in your example data or is yourIsRootNode()
wrong? \$\endgroup\$