Building XML elements from a list using LINQ and foreach

Question 1

I wrote the following code:

var xmlArray = from m in (from row in data select row.Mitarbeiter).Distinct()
 select "<Value Type='Text'>" + m + "</Value>";
var xml = string.Join("",xmlArray);

Then I noticed there are two iteration (two from) and also the Distinct() and I rewrote it using a foreach:

var mitarbeiter = new List<string>();
var xmlArray = new List<string>();
foreach(var row in data)
{
 if (!mitarbeiter.Contains(row.Mitarbeiter)){
 mitarbeiter.Add(row.Mitarbeiter);
 xmlArray.Add("<Value Type='Text'>" + row.Mitarbeiter + "</Value>");
 }
}
var xml = string.Join("",xmlArray);

Ma opinion is that the first example is cleaner...but what about efficiency/performance?

Question 2

You will never notice the performance difference of that 2 solutions. Therefore use the one that is more readable.

Question 3

LINQ is more readable however falls behind foreach when talking about performance because it uses foreach behind the scenes anyway. If performance bothers you, you should stick with the second variant, else go with LINQ. But sometimes it can loss even in readability..

Question 4

@200_success: I don't think the edited title is appropriate to the discussion. The topic is not regarding the Xml elements, but how to iterate the collections.

Question 5

Change it again if you'd like, but keep in mind that the original title isn't quite in line with the site guidelines — see How to Ask and the rationale behind the rule.

Question 6

The best way to discuss performance is by measuring it. I suggest you use BenchmarkDotNet and never roll your own solution.

Onto your question: it is safe to assume that LINQ will in most cases be slower than the iterative approach. Not only does it allocate more (for example through lambdas) but it's also more expensive in runtime because of interface dispatching and other aspects.

There is obviously a balance to be struck between readability and performance -- if you have no performance issues then LINQ might very well be the best option although I would argue that in this case the iterative approach is very readable as well (if not more).

Something you can consider: use a HashSet<T> instead of a List<T> if your usecase allows it. Performing Contains() is O(n) for a list but O(1) for a Set.

Question 7

Thanks for your hints. What about using a Dictionary instead two lists?

Question 8

Why store the xml version in the first place if it's a constant based on mitarbeiter?

Question 9

Because in a second moment I need them for a query and I can do string.Join("", mitarbeiter.Values)

Question 10

That might be less lines of code but you're also creating more string objects your way (and thus a higher memory impact). If you omit that, you can just combine them with a stringbuilder and avoid creating the intermediate concatenation.

Question 11

My opinion is that the first example is cleaner

I agree, but I find it hard to read a mixture of fluent syntax and query comprehension syntax.

I would be inclined to write:

var xmlArray = data
 .Select(row => row.Mitarbeiter)
 .Distinct()
 .Select(m => "<Value Type='Text'>" + m + "</Value>");
var xml = string.Join("",xmlArray);

Fun bonus: this trick does not work in your case because of the "Distinct". But in cases where you do not have that, you can rewrite any query of the form

from x in (from y in ...y-query...) 
...x-query...

as

from y in 
...y-query... 
into x 
...x-query...

This can be easier to read.

Question 12

Indeed, forgot to mention that. As it think. it's better to use fluent syntax, it's much more readable and explicit. Only case where I use LINQ syntax, is joins, otherwise Chained methods are better. Sometimes it could be simple lst.Where(x=>x>10) instead of creepy from x in lst where x>10 select x

Question 13

@BogdanMart: I personally prefer the comprehension syntax, but I dearly wish a mechanism had been invented to move fluent syntax into comprehension syntax. Imagine if we could have from row in data select row.Whatever with Distinct() into m select ... It would be so nice. I regret that it didn't make it into the first version of LINQ.

Question 14

Linq is always slower than loops, but it's less error prone to copy-paste linq code with slight modifications. And it's in most times easier to write LINQ code (if that's not complex iterational algorithm). You need to ballance development time and maintainability vs performance, depending on your needs.

Main pros of LINQ is ease of write, and ease to reuse such code blocks, without possibility to forget renaming variables, for example xmlArray is referenced two times in second code block, so if you copy-paste this code, you should change both, that can produce bugs, if variable scopes intersect.

I've noticed one problem in your code -- "<Value Type='Text'>" + m + "</Value>" you should xml-escape that string variable(m).

Resume If you already wrote iterative solution, keep it, as it will perform faster, and, perhaps use StringBuilder instead of List<string> xmlArray.

P.S. @jeroen-vannevel thank's for pointing to BenchmarkDotNet, I was using LambdaMicrobenchmarking for such goals, but BenchmarkDotNet seems more decent, and feature full. I was searching for something similar to JMH in .Net land, and you found one! Will use it in my upcoming benchmarks ;)

But perhaps, for such an easy scenario LambdaMicrobenchmarking will suit better, less boilerplate code is required.

Question 15

What do you mean with xml-escape that string?

Question 16

@Emaborsa strign can contain characters like > < and other invalid for XML so you can use something like this msdn.microsoft.com/en-us/library/… or equivalent

Question 17

Alternatively, he could use System.Xml.Linq.XElement : new XElement("Value", new XAttribute("Type", "Text"), m);. This would auto-encode m. XElement overrides .ToString(), so the string.Join call wouldn't have to change.

Question 18

Indeed, but that will create performance overhead, i guess.

score 10 · Accepted Answer · 2016-05-31 15:08:45Z

The best way to discuss performance is by measuring it. I suggest you use BenchmarkDotNet and never roll your own solution.

Onto your question: it is safe to assume that LINQ will in most cases be slower than the iterative approach. Not only does it allocate more (for example through lambdas) but it's also more expensive in runtime because of interface dispatching and other aspects.

There is obviously a balance to be struck between readability and performance -- if you have no performance issues then LINQ might very well be the best option although I would argue that in this case the iterative approach is very readable as well (if not more).

Something you can consider: use a HashSet<T> instead of a List<T> if your usecase allows it. Performing Contains() is O(n) for a list but O(1) for a Set.

Thanks for your hints. What about using a Dictionary instead two lists?
Why store the xml version in the first place if it's a constant based on mitarbeiter?
Because in a second moment I need them for a query and I can do string.Join("", mitarbeiter.Values)
That might be less lines of code but you're also creating more string objects your way (and thus a higher memory impact). If you omit that, you can just combine them with a stringbuilder and avoid creating the intermediate concatenation.

Stack Exchange Network

Building XML elements from a list using LINQ and foreach

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Building XML elements from a list using LINQ and foreach

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions