Below is a file parser for old style nuget package configuration files. The Parallel.ForEach
is probably overkill; in most cases its fast enough without. However, when using the parallel loop, can I avoid the need for the lock / shared list?
class Program
{
private static readonly object _lock = new Object();
private static readonly List<Package> _packages = new List<Package>();
static void Main(string[] args)
{
string[] files = Directory.GetFiles(Directory.GetCurrentDirectory(), "packages.config", SearchOption.AllDirectories);
Parallel.ForEach(files, GetPackagesForFile);
IEnumerable<IGrouping<string, Package>> idGroups =
_packages.GroupBy(p => p.Id).OrderBy(g => g.Key);
foreach (IGrouping<string, Package> idGroup in idGroups)
{
IEnumerable<IGrouping<string, Package>> versionGroups = idGroup.GroupBy(p => p.Version);
Console.WriteLine($"{idGroup.Key} ({versionGroups.Count()})");
foreach (IGrouping<string, Package> versionGroup in versionGroups)
Console.WriteLine($"\t{versionGroup.Key}");
}
}
private static void GetPackagesForFile(string filepath)
{
var filePackages = XDocument.Load(filepath).Root.Elements("package").Select(GetPackageFromElement);
lock(_lock) _packages.AddRange(filePackages);
}
private static Package GetPackageFromElement(XElement element) =>
new Package(
element.Attribute("id").Value,
element.Attribute("version").Value,
element.Attribute("targetFramework").Value);
}
2 Answers 2
If you want to do it this way then you need a lock but since the rest of your code is using Linq I would suggest you convert it over to PLINQ instead of using Parallel.ForEach
,
Also you are iterating over the enumerable versionGroups
twice. In your specific case it shouldn't cause an issue but that's not best practices with an IEnumerable
.
First I'm going to change the GetPackagesForFile
to return an array of Packages instead of using the static _packages
field.
private static Package[] GetPackagesForFile(string filepath)
{
return XDocument.Load(filepath).Root?.Elements("package").Select(GetPackageFromElement).ToArray() ??
new Package[0];
}
I also changed it so to not throw a NullException
if root not defined. That's up to you if you want the exception there to tell you something was wrong or not. So you might want to remove the ?
and the ??
part if you want the exception to be thrown.
Now if you want to use PLINQ we just need to use the AsParallel()
extension method. I'm going to use an anonymous class but you could create one if that's what you prefer. Since I'm using an anonymous class I can't specify the type so going to use the var keyword, which doesn't match your coding style.
static void Main(string[] args)
{
string[] files = Directory.GetFiles(Directory.GetCurrentDirectory(), "packages.config",
SearchOption.AllDirectories);
var idGroups = files.AsParallel()
.SelectMany(GetPackagesForFile)
.GroupBy(p => p.Id)
.Select(g => new
{
Id = g.Key,
Version = new HashSet<string>(g.Select(p => p.Version))
}).OrderBy(p => p.Id);
foreach (var idGroup in idGroups)
{
Console.WriteLine($"{idGroup.Id} ({idGroup.Version.Count})");
foreach (string version in idGroup.Version)
Console.WriteLine($"\t{version}");
}
}
Since in your code all you care about is the distinct list of versions for each ID
and the count I put them in a hashset instead of doing the grouping. I could have used the g.Select(p => p.Version).Distinct().ToList()
but just adding them to a hashset seems simpler code.
And in closing like you already stating making this parallel might be overkill. If you want to test it compared to not being paralleled then you just need to remove the AsParallel()
and the rest of the code can stay the same.
I also prefer the PLINQ solution showed by @CharlesNRice but as far as clean-code is concerned you could change a couple things.
GetPackagesForFile
method should not have side-effects. This means, as far as possible, it's always better to use pure methods.
Applied to your code GetPackagesForFile
should return some result, e.g IEnumerable<Package>
private static IEnumerable<Package> GetPackages(string filepath)
{
return XDocument.Load(filepath).Root.Elements("package").Select(GetPackageFromElement);
}
that you add to _packages
inside Parallel.ForEach
Parallel.ForEach(
source: files,
body: file =>
{
lock(_lock)
{
_packages.AddRange(GetPackages(file));
}
}
);
Another good habit is to always use {}
. It makes your code less error-prone.
If you want to make your code fully lazy, then use EnumerateFile
instead of the eager GetFiles
Using var
s instead of explicit types would make your code less verbose.
Explore related questions
See similar questions with these tags.
ConcurrentBag
. \$\endgroup\$ConcurrentBag
has problems with accessing items from different threads. I'd use aConcurrentDictionary<Package, byte>
and just have thebyte
be0
. \$\endgroup\$