I need to optimize this snippet to be faster:
Dim lst = (From t In DocElet.ChildNodes Select ID = t.item("ID").outerxml).Distinct().ToList()
Parallel.For(0, lst.Count, Sub(i)
Dim P As XmlElement = GetElement(lst(i))
Dim ls = (From t In DocElet.ChildNodes Where t.item("ID").innerText = P.InnerText Select t)
Parallel.ForEach(ls, Sub(D)
Dim verif_date As String = D.Item("DAD").InnerText
Sej.ID = D.Item("ID").InnerText
End Sub)
End Sub)
This is the XML structure:
<?xml version="1.0" encoding="utf-8"?>
<PatientData>
<Sejour><ID></ID><DAD></DAD></Sejour>
</PatientData>
I'm asking how I can optimize my code because it takes a lot of time (50 sec) in the case where the list contains 20000 elements.
1 Answer 1
The indentation makes your snippet quite hard to read - the VB.NET syntax for inline delegates is fairly bulky (at least compared to C#), I'd give it more air; being able to read the code is step one.
Dim lst = (From t In DocElet.ChildNodes
Select ID = t.item("ID").outerxml).Distinct()
.ToList()
Parallel.For(0, lst.Count,
Sub(i)
Dim P As XmlElement = GetElement(lst(i))
Dim ls = (From t In DocElet.ChildNodes
Where t.item("ID").innerText = P.InnerText
Select t)
Parallel.ForEach(ls,
Sub(D)
Dim verif_date As String = D.Item("DAD").InnerText
Sej.ID = D.Item("ID").InnerText
End Sub)
End Sub)
Ok. So you're running a Parallel.For
, and inside that, you're running a Parallel.ForEach
. That's quite a lot of overhead just there, it's likely that you're shooting yourself in the foot here, I'd first remove the inner Parallel.ForEach
:
Parallel.For(0, lst.Count,
Sub(i)
Dim P As XmlElement = GetElement(lst(i))
Dim ls = (From t In DocElet.ChildNodes
Where t.item("ID").innerText = P.InnerText
Select t)
For Each foo In ls
Dim verif_date As String = D.Item("DAD").InnerText
Sej.ID = D.Item("ID").InnerText
Next
End Sub)
What's the purpose of verif_date
? If it's not used, remove it:
For Each foo In ls
Sej.ID = D.Item("ID").InnerText
Next
Now the outer loop is making yet another selection:
Dim ls = (From t In DocElet.ChildNodes
Where t.item("ID").innerText = P.InnerText
Select t)
I'd try to flatten that. I'm not very familiar with the LINQ syntax in VB.NET, but what you want is a SelectMany
, where you'd select the child nodes where "ID"'s inner text matches the element's inner text for the node you're iterating - in other words the ls
query gets outside the outer loop and into the lst
query, so you only need a single loop (which can probably run in parallel).
If the XML is massive, it could be worth trying some PLINQ:
Dim lst = (From t In DocElet.ChildNodes
Select ID = t.item("ID").outerxml)
.Distinct()
.AsParallel()
.SelectMany(...)
.ToList()
Now you can iterate lst
with Parallel.ForEach
(time it with a normal For Each
first) and do your thing.
As an aside, I have to say your naming is inconsistent and hard to follow: Sub(i)
looks good (Sub(index)
would be better), Sub(D)
doesn't. I realize VB.NET isn't case-sensitive, but if you're going to have a convention to make function parameters camelCase
, by all means stick to it. lst
would probably be better off as elements
or whatever makes sense - use real, readable words for your identifiers, not just arbitrary abbreviations.
-
1\$\begingroup\$ is it true :
accessing XML from multiple threads with Parallel.For/Parallel.ForEach is not guaranteed to work correctly as these classes are not thread safe.
\$\endgroup\$Lamloumi Afif– Lamloumi Afif2014年06月27日 15:36:38 +00:00Commented Jun 27, 2014 at 15:36 -
1\$\begingroup\$ Awesome, thanks for that comment, that falls under "good to know"! ;) \$\endgroup\$Mathieu Guindon– Mathieu Guindon2014年06月27日 16:36:34 +00:00Commented Jun 27, 2014 at 16:36