7
\$\begingroup\$

I have an in memory list that I am joining to results from a query using entity framework. My list will most likely never be greater than 2500 records. The results from the database can fluctuate, depending on the filters used and it will grow in size. I've been able to join the results successfully but it feels a bit sluggish and I worry as the database grows that it could get worse. Is there anything I can do to make this more efficient? Please let me know if you need any additional information. Thanks!

var query = from e in _context.Employees
 where (...filters...)
 select e;
var employees = query.AsEnumerable();
var offices = _officeService.GetAllOffices();
var employeeData = from e in employees
 join o in offices on e.Office equals o.Code
 select new EmployeeData
 {
 EmployeeId = e.EmployeeId,
 FullName = e.FullName,
 Office = e.Office,
 Area = o.Area,
 Region = o.Region,
 OfficeName = o.Name,
 Position = e.Position,
 Languages = e.Languages
 };
return employeeData;
t3chb0t
44.6k9 gold badges84 silver badges190 bronze badges
asked Apr 25, 2017 at 12:42
\$\endgroup\$
2
  • 2
    \$\begingroup\$ Why not let the database do the JOIN? \$\endgroup\$ Commented Apr 25, 2017 at 14:22
  • 2
    \$\begingroup\$ The data in the list is pulled from a web api and does not reside in my database. \$\endgroup\$ Commented Apr 25, 2017 at 14:49

3 Answers 3

10
\$\begingroup\$

The dilemmas here:

  1. query is an IQueryable. If you join it with offices, i.e. without AsEnumerable(), Entity Framework will throw an exception about primitive values, which is an obscure way of telling you that it can't translate offices into SQL.

  2. So join in memory, i.e. with query.AsEnumerable(). But now all data from query will be pulled into memory, which has two adverse effects: neither the reduction in numbers of records by joining with offices nor the reduction in width of the result set by selecting only a restricted number of properties can be translated back to the SQL query.

You obviously want to benefit from both strands of data reduction.

As for the reduction in number of rows, there's no way to make Entity Framework join with local data other than lists of primitive values. Even then, joining is rather inefficient because EF has to convert the local list into a temporary SQL table (sort of), which requires a considerable amount of code. It's more efficient to use Contains, which translates into an IN statement:

var officesCodes = offices.Select(o => o.Code).ToList();
var employeeInfo = from e in employees
 where officesCodes.Contains(e.Office)
 select ...

Now employeeInfo is an IQueryable, so it's possible to reduce the width of the result set by projection:

var employeeInfo = from e in employees
 where officesCodes.Contains(e.Office)
 select new
 {
 EmployeeId = e.EmployeeId,
 FullName = e.FullName,
 Office = e.Office,
 Position = e.Position,
 Languages = e.Languages
 };

This achieves the desired data reduction. But now you haven't got EmployeeData objects yet. Can't be done by this query, because they also contain data from offices. This final step can only be achieved by joining the result in memory with offices:

var employeeData = from e in employeeInfo.AsEnumerable()
 join o in offices on e.Office equals o.Code
 select new EmployeeData
 {
 EmployeeId = e.EmployeeId,
 FullName = e.FullName,
 Office = e.Office,
 Area = o.Area,
 Region = o.Region,
 OfficeName = o.Name,
 Position = e.Position,
 Languages = e.Languages
 };
answered Apr 25, 2017 at 15:53
\$\endgroup\$
6
  • \$\begingroup\$ query is an IQueryable. If you join it with offices, Entity Framework will throw not quite. OP uses AsEnumerable so it won't throw. \$\endgroup\$ Commented Apr 25, 2017 at 15:56
  • \$\begingroup\$ @t3chb0t I mean, if you join it as such, without AsEnumerable. \$\endgroup\$ Commented Apr 25, 2017 at 15:58
  • \$\begingroup\$ It's hard to optimize this query. I was also thinking of officesCodes.Contains so that SQL can use IN but now we actually might hit the limit of it Limit on the WHERE col IN (...) condition if there are too many offices and 2500 is not that far from it. \$\endgroup\$ Commented Apr 26, 2017 at 4:22
  • \$\begingroup\$ @t3chb0t There is a way out, but I'd love to hear some feedback from OP on this. If a number of 2500 is hardly ever reached and it's usually far less, it may be OK. \$\endgroup\$ Commented Apr 26, 2017 at 7:12
  • \$\begingroup\$ So first of all, I tried your suggestion and it does seem to run faster so it has been optimized. Thank you for that. Secondly, while the total number of offices may be around 2500, I may be able to filter that list to be less since a user pulling this information will almost never need all of the results. Should I filter my office list and update my query? \$\endgroup\$ Commented Apr 26, 2017 at 11:55
5
\$\begingroup\$

Gert Arnold gave a great answer but let me suggest one more soution to try. Yes, data which is got from another source (rather than DB) can be processed in 2 ways:

  1. Download as small data part from DB to local as possible and join locally (usually using AsEnumerable() or basically ToList()). You got many good thoughts on this in other answers.
  2. Another one is different - upload your local data to server somehow and perform query on DB side. Uploading can be done differently: using some temp tables OR using VALUES. Fortunately there is a small extension for EF now (for both EF6 and EF Core) which you could try (It is written by me). It is EntityFrameworkCore.MemoryJoin (name might be confusing, but it supports both EF6 and EF Core). As stated in author's article (me) it modifies SQL query passed to server and injects VALUES construction with data from your local list. And query is executed on DB server.

So in your case you may try the following:

var query= from e in _context.Employees
 where (...filters...)
 select e;
// change 1: no need to use AsEnumerable now
var employees = query; 
// change 2: get IQueryable representation using EntityFrameworkCore.MemoryJoin
var offices = context.FromLocalList(_officeService.GetAllOffices());
var employeeData = from e in employees
 join o in offices on e.Office equals o.Code
 select new EmployeeData
 {
 EmployeeId = e.EmployeeId,
 FullName = e.FullName,
 Office = e.Office,
 Area = o.Area,
 Region = o.Region,
 OfficeName = o.Name,
 Position = e.Position,
 Languages = e.Languages
 };
// change 3 (suggested), let's return result list instead of IQueryable
return employeeData.ToList();

Using code above, query will be done on DB side. 2500 records should be ok to process (I used with 20k), but of course need to ensure this works fine for you.

answered Mar 24, 2018 at 13:27
\$\endgroup\$
9
  • \$\begingroup\$ join locally - this isn't necessarily a good advice :-\ is there any reason you won't let the SQL Server do the job? \$\endgroup\$ Commented Mar 24, 2018 at 13:31
  • \$\begingroup\$ @t3chb0t There are different views about that going around. One says the DB should deliver as little data as possible and clean it's own data, the other says the DB should do as little as possible and if it's computationally cheaper to off-load more data, do just that. It's CPU vs bandwith basically. \$\endgroup\$ Commented Mar 24, 2018 at 14:34
  • \$\begingroup\$ It's not hard to figure out that you're the author of this library. That doesn't matter, but you should add a disclaimer, for example looking like the one this user consistently adds to his posts. For the record, out of curiosity I tried your NuGet package, but doing something similar to the example in your blog, I couldn't get it working with EF-core (2.02) and EF 6.2.0 due to various exceptions. I realize that doesn't give you much to look into, but a comment is too short to give any more detail. \$\endgroup\$ Commented Mar 24, 2018 at 22:11
  • \$\begingroup\$ OK, got it working in EF6. In EF-core it doesn't seem to recognize the EF-core DbContext type. Nevertheless, the first results look promising. The only thing that worries me a bit is the possible performance hit of generating value tuples when there are "many" of them. Did you do any benchmarking on that? Also, maybe you could add an answer to this question and have my vote - again. \$\endgroup\$ Commented Mar 24, 2018 at 22:48
  • \$\begingroup\$ @t3chb0t Sorry, your comment confused me a lot. 1) What do you mean saying let the SQL Server do the job? How will SQL Server join data from DB to data from third party API? 2) Why do you think join locally IS my advice? I just suggested some solutions and both may work in some scenarios. (personally I prefer #2 btw) 3) Yes local join is not always a good idea, BUT sometimes EF basically CAN'T generate efficient query, so doing some job locally may be good. And btw EF Core does some job locally automatically.... etc \$\endgroup\$ Commented Mar 26, 2018 at 13:38
0
\$\begingroup\$

I have recently built a simple method to do this entirely in-database, by converting the static data into a query that returns a result set of the data. This produces a standard IQueryable which can be fed into the rest of the query. My usecase was to perform in-database joins, but I'm sure there are many other uses.

The general strategy was to construct a query of SELECTs with provided values, that are concatenated together with UNION ALL. I avoided the VALUES syntax since the syntax is inexplicably different across Postgres, MySql, and MS SQL.

SELECT "Data1A" AS columnA, "Data1B" AS columnB
UNION ALL "Data2A", "Data2B"
UNION ALL "Data3A", "Data3B"
UNION ALL "Data4A", "Data4B"
-- and so on, and so forth...
;

This can then be used as a subquery inside the main query, and joined on just like any other result set.

To use this with EF, the raw SQL is constructed with a StringBuilder and then executed using FromSqlRaw().

In EF 7 and below, the only nice way to execute this SQL is to first create a DbSet in the DbContext that acts as a virtual table representing the data being returned:

public DbSet<StaticDataEntity> StubDbSet { get; set; }
// ...
// Exclude the DbSet from all migrations, since we don't actually want it to be a real table in the database.
builder.Entity<StaticDataEntity>().HasNoKey().ToTable("tmp_stub_table", t => t.ExcludeFromMigrations());

EF 8 will enable returning unmapped entities from raw SQL, removing the above ceremony of creating a stub DB set entirely.

Then, you can use this extension method over that DbSet that generates a raw SQL query, giving you an IQueryable to join against.

Example entity class:

public class StaticDataEntity {
 public string A { get;set; }
 public string B { get;set; }
}
public static IQueryable<StaticDataEntity> CreateResultSetFromData(this DbSet<StaticDataEntity> dbSet, List<StaticDataEntity> values)
{
 // StaticDataEntity is the entity POCO class. This one has two properties, A and B, corresponding to two columns of data.
 var columnNames = (A: nameof(StaticDataEntity.A), B: nameof(StaticDataEntity.B));
 if (values.Count > 0)
 {
 // Create a UNION of SELECT "PredefinedDataA" AS `A`, SELECT "PredefinedDataB" AS `B`
 // This generates a temporary result set from the input values.
 string[] staticDataArray = new string[values.Count * 2];
 var queryStringBuilder = new StringBuilder();
 for (int i = 0; i < values.Count; i++)
 {
 int dataIndex = i * 2;
 if (i == 0)
 {
 queryStringBuilder.Append($"SELECT {{{dataIndex}}} AS {columnNames.A}, {{{dataIndex + 1}}} AS {columnNames.B} ");
 }
 else
 {
 queryStringBuilder.Append($"UNION SELECT {{{dataIndex}}}, {{{dataIndex + 1}}} ");
 }
 // Build one-dimensional array of the values to pass into FromSqlRaw as parameters
 staticDataArray[dataIndex] = values[i].A;
 staticDataArray[dataIndex + 1] = values[i].B;
 }
 return dbSet.FromSqlRaw(queryStringBuilder.ToString(), staticData).AsNoTracking();
 }
 else
 {
 // No data, return empty result set
 return dbSet.FromSqlRaw(
 $"SELECT NULL AS {columnNames.A}, NULL AS {columnNames.B} WHERE FALSE",
 null
 ).AsNoTracking();
 }
}

Now, you can do:

// TODO: Populate data list with entries
var data = new List<StaticDataEntity>();
// Create static data query
IQueryable<StaticDataEntity> dataQuery = dbContext.StubDbSet.CreateResultSetFromData(data);
// Join static data column A to in-database "SomeEntities" table column C.
// Return static A and B, and database column C and D.
var resultQuery = dbContext.SomeEntities.Join(
 dataQuery,
 se => se.C,
 sde => sde.A,
 (se, sde) => new {
 sde.A, sde.B, se.C, se.D
 });
// Execute the query
var result = await resultQuery.ToListAsync();
Toby Speight
87.8k14 gold badges104 silver badges325 bronze badges
answered Mar 23, 2023 at 6:00
\$\endgroup\$
1
  • \$\begingroup\$ Welcome to Code Review! This is all good work, but it's lacking a review of the code in the question. Can you update your description to explain what's better about your approach, and what trade-offs are being made? \$\endgroup\$ Commented Mar 23, 2023 at 7:24

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.