This code written by @Rahul Singh in this post:
var result = _dbContext.ExtensionsCategories.ToList().GroupBy(x => x.Category)
.Select(x =>
{
var files = _dbContext.FileLists.Count(f => x.Select(z => z.Extension).Contains(f.Extension));
return new
{
Category = x.Key,
TotalFileCount = files
};
});
This code has a problem when used inside database context and we should use ToList()
like this to fix "Only primitive types or enumeration types are supported in this context" error:
var result = _dbContext.ExtensionsCategories.ToList().GroupBy(x => x.Category)
The problem of this is that ToList()
fetches all records and reduces performance. Now I wrote my own code:
var categoriesByExtensionFileCount =
_dbContext.ExtensionsCategories.Select(
ec =>
new
{
Category = ec.Category,
TotalSize = _dbContext.FileLists.Count(w => w.Extension == ec.Extension)
});
var categoriesTotalFileCount =
categoriesByExtensionFileCount.Select(
se =>
new
{
se.Category,
TotalCount =
categoriesByExtensionFileCount.Where(w => w.Category == se.Category).Sum(su => su.TotalSize)
}).GroupBy(x => x.Category).Select(y => y.FirstOrDefault());
The performance of this code is better but it has many lines of code. Any idea about improving the performance of the latter, or reducing the length of the former?
1 Answer 1
You can indeed improve the query by changing the strategy.
If you first GroupBy
file extension and Count
them inside the GroupBy
element selector you can remove ToList
and the query will be executed on the server and it won't be necessary to download all the data to group it later on your machine. You will then only get a few groups and their number of elements.
Now for the few file extensions it's very easy and quick to get the category names:
var result =
Files
.GroupBy(x => x.Extension, (ext, fs) => new
{
Extension = ext,
Category = Categories.Single(c => c.Extension == ext).Name,
FileCount = fs.Count()
})
// the second GroupBy sums categories with multiple file extensions like pdf, doc etc.
.GroupBy(x => x.Category, (cat, counts) => new
{
Category = cat,
FileCount = counts.Sum(y => y.FileCount)
});
This will produce the following query:
SELECT SUM([t3].[value2]) AS [FileCount], [t3].[value] AS [Category]
FROM (
SELECT (
SELECT [t2].[Name]
FROM [Category] AS [t2]
WHERE [t2].[Extension] = [t1].[Extension]
) AS [value], [t1].[value] AS [value2]
FROM (
SELECT COUNT(*) AS [value], [t0].[Extension]
FROM [File] AS [t0]
GROUP BY [t0].[Extension]
) AS [t1]
) AS [t3]
GROUP BY [t3].[value]
I've tested it with 1.000.000
rows:
and compared with the ToList
version:
In the last rows of the statistics you can see that short version is about 3 times faster then the original one.
-
\$\begingroup\$ Unfortunatelly I don't know how to measure data transfered to the client when testing with
LINQPad
:( \$\endgroup\$t3chb0t– t3chb0t2015年10月21日 18:27:03 +00:00Commented Oct 21, 2015 at 18:27