I wrote the following code:
var xmlArray = from m in (from row in data select row.Mitarbeiter).Distinct()
select "<Value Type='Text'>" + m + "</Value>";
var xml = string.Join("",xmlArray);
Then I noticed there are two iteration (two from) and also the Distinct()
and I rewrote it using a foreach
:
var mitarbeiter = new List<string>();
var xmlArray = new List<string>();
foreach(var row in data)
{
if (!mitarbeiter.Contains(row.Mitarbeiter)){
mitarbeiter.Add(row.Mitarbeiter);
xmlArray.Add("<Value Type='Text'>" + row.Mitarbeiter + "</Value>");
}
}
var xml = string.Join("",xmlArray);
Ma opinion is that the first example is cleaner...but what about efficiency/performance?
3 Answers 3
The best way to discuss performance is by measuring it. I suggest you use BenchmarkDotNet and never roll your own solution.
Onto your question: it is safe to assume that LINQ will in most cases be slower than the iterative approach. Not only does it allocate more (for example through lambdas) but it's also more expensive in runtime because of interface dispatching and other aspects.
There is obviously a balance to be struck between readability and performance -- if you have no performance issues then LINQ might very well be the best option although I would argue that in this case the iterative approach is very readable as well (if not more).
Something you can consider: use a HashSet<T>
instead of a List<T>
if your usecase allows it. Performing Contains()
is O(n) for a list but O(1) for a Set.
-
\$\begingroup\$ Thanks for your hints. What about using a
Dictionary
instead two lists? \$\endgroup\$Emaborsa– Emaborsa2016年06月01日 06:13:14 +00:00Commented Jun 1, 2016 at 6:13 -
\$\begingroup\$ Why store the xml version in the first place if it's a constant based on mitarbeiter? \$\endgroup\$Jeroen Vannevel– Jeroen Vannevel2016年06月01日 06:26:12 +00:00Commented Jun 1, 2016 at 6:26
-
\$\begingroup\$ Because in a second moment I need them for a query and I can do
string.Join("", mitarbeiter.Values)
\$\endgroup\$Emaborsa– Emaborsa2016年06月01日 06:31:38 +00:00Commented Jun 1, 2016 at 6:31 -
\$\begingroup\$ That might be less lines of code but you're also creating more string objects your way (and thus a higher memory impact). If you omit that, you can just combine them with a stringbuilder and avoid creating the intermediate concatenation. \$\endgroup\$Jeroen Vannevel– Jeroen Vannevel2016年06月01日 18:21:39 +00:00Commented Jun 1, 2016 at 18:21
My opinion is that the first example is cleaner
I agree, but I find it hard to read a mixture of fluent syntax and query comprehension syntax.
I would be inclined to write:
var xmlArray = data
.Select(row => row.Mitarbeiter)
.Distinct()
.Select(m => "<Value Type='Text'>" + m + "</Value>");
var xml = string.Join("",xmlArray);
Fun bonus: this trick does not work in your case because of the "Distinct". But in cases where you do not have that, you can rewrite any query of the form
from x in (from y in ...y-query...)
...x-query...
as
from y in
...y-query...
into x
...x-query...
This can be easier to read.
-
\$\begingroup\$ Indeed, forgot to mention that. As it think. it's better to use fluent syntax, it's much more readable and explicit. Only case where I use LINQ syntax, is joins, otherwise Chained methods are better. Sometimes it could be simple
lst.Where(x=>x>10)
instead of creepyfrom x in lst where x>10 select x
\$\endgroup\$Bohdan Mart– Bohdan Mart2016年05月31日 21:38:55 +00:00Commented May 31, 2016 at 21:38 -
2\$\begingroup\$ @BogdanMart: I personally prefer the comprehension syntax, but I dearly wish a mechanism had been invented to move fluent syntax into comprehension syntax. Imagine if we could have
from row in data select row.Whatever with Distinct() into m select ...
It would be so nice. I regret that it didn't make it into the first version of LINQ. \$\endgroup\$Eric Lippert– Eric Lippert2016年05月31日 21:48:21 +00:00Commented May 31, 2016 at 21:48
Linq is always slower than loops, but it's less error prone to copy-paste linq code with slight modifications. And it's in most times easier to write LINQ code (if that's not complex iterational algorithm). You need to ballance development time and maintainability vs performance, depending on your needs.
Main pros of LINQ is ease of write, and ease to reuse such code blocks, without possibility to forget renaming variables, for example xmlArray is referenced two times in second code block, so if you copy-paste this code, you should change both, that can produce bugs, if variable scopes intersect.
I've noticed one problem in your code -- "<Value Type='Text'>" + m + "</Value>"
you should xml-escape that string variable(m).
Resume If you already wrote iterative solution, keep it, as it will perform faster, and, perhaps use StringBuilder
instead of List<string> xmlArray
.
P.S. @jeroen-vannevel thank's for pointing to BenchmarkDotNet, I was using LambdaMicrobenchmarking for such goals, but BenchmarkDotNet seems more decent, and feature full. I was searching for something similar to JMH in .Net land, and you found one! Will use it in my upcoming benchmarks ;)
But perhaps, for such an easy scenario LambdaMicrobenchmarking will suit better, less boilerplate code is required.
-
\$\begingroup\$ What do you mean with
xml-escape that string
? \$\endgroup\$Emaborsa– Emaborsa2016年06月01日 06:11:13 +00:00Commented Jun 1, 2016 at 6:11 -
\$\begingroup\$ @Emaborsa strign can contain characters like > < and other invalid for XML so you can use something like this msdn.microsoft.com/en-us/library/… or equivalent \$\endgroup\$Bohdan Mart– Bohdan Mart2016年06月01日 15:40:21 +00:00Commented Jun 1, 2016 at 15:40
-
\$\begingroup\$ Alternatively, he could use
System.Xml.Linq.XElement
:new XElement("Value", new XAttribute("Type", "Text"), m);
. This would auto-encodem
.XElement
overrides.ToString()
, so thestring.Join
call wouldn't have to change. \$\endgroup\$Dan Lyons– Dan Lyons2016年06月01日 17:53:34 +00:00Commented Jun 1, 2016 at 17:53 -
\$\begingroup\$ Indeed, but that will create performance overhead, i guess. \$\endgroup\$Bohdan Mart– Bohdan Mart2016年06月01日 21:00:26 +00:00Commented Jun 1, 2016 at 21:00
Explore related questions
See similar questions with these tags.
foreach
when talking about performance because it usesforeach
behind the scenes anyway. If performance bothers you, you should stick with the second variant, else go with LINQ. But sometimes it can loss even in readability.. \$\endgroup\$