Application Logic
I wrote a web scraper that store some football data inside my database. Those data need to be update if 7 days are passed since the last update. I have three types of record:
- Team
- Season
- Player
The table structure of those records is essentially this:
id | season_id | update_at
only the second column name change for the specific type, eg for Team
:
id | team_id | | update_at
Structure
The web scraper run 24h on a server, and check steadily if the records need to be updated. For doing that I wrote this method:
public List<Season> GetSeasonsToAddUpdate(List<Season> seasons)
{
//Store the seasons that need to update
List<Season> newSeasons = new List<Season>();
using (MySqlConnection connection = new DBConnection().Connect)
{
using (MySqlCommand command = new MySqlCommand())
{
connection.Open();
command.Connection = connection;
//Get all the ids of the seasons to add or update
command.CommandText = "SELECT cs.season_id " +
"FROM (SELECT {textToReplace}) cs " +
"LEFT JOIN competition_seasons s on s.id = cs.season_id " +
"WHERE s.id IS NULL OR " +
"s.update_at < DATE_SUB(CURRENT_DATE, INTERVAL 7 DAY) OR s.update_at IS NULL";
//Join all the available parameters.
StringBuilder unions = new StringBuilder();
//List of available parameters.
List<MySqlParameter> prms = new List<MySqlParameter>();
//Add the first parameter.
MySqlParameter pr = new MySqlParameter("@first", MySqlDbType.UInt32) { Value = seasons.First().Id };
prms.Add(pr);
unions.Append($" @first as season_id ");
//Start from one 'cause first param already defined in query.
int prmCounter = 1;
//Create the parameter for the derived table based on the available seasons
foreach (Season t in seasons.Skip(1))
{
string placeholder = "@p" + prmCounter;
unions.Append($" UNION ALL SELECT {placeholder}");
pr = new MySqlParameter(placeholder, MySqlDbType.Int32) { Value = t.Id };
prms.Add(pr);
prmCounter++;
}
command.Parameters.AddRange(prms.ToArray());
command.CommandText = command.CommandText.Replace("{textToReplace}", unions.ToString());
using (MySqlDataReader reader = command.ExecuteReader())
{
//Remove all the seasons that doesn't need update.
while (reader.Read())
{
newSeasons.Add(seasons.FirstOrDefault(x => x.Id == Convert.ToInt32(reader["season_id"])));
}
}
return newSeasons;
}
}
}
Basically, the method above only works for the Season
object, and perform the following query:
SELECT cs.season_id
FROM (SELECT 67 as season_id UNION ALL SELECT 68 UNION ALL
SELECT 69 UNION ALL SELECT 70
) cs LEFT JOIN
competition_season s
on s.id = cs.season_id AND
update_at >= DATE_SUB(CURRENT_DATE, INTERVAL 7 DAY)
WHERE s.id IS NULL;
The query structure is essentially the same for the other two type: Team
and Players
(only the table name change).
The goal
My goal is remove the code redundancy, 'cause I have three method of the code above which are essentially the same but change only for the table name and the type of object, eg:
public List<Team> GetTeamsToAddUpdate(List<Team> teams)
{
//Store the teams that need to update
List<Team> newTeams = new List<Team>();
using (MySqlConnection connection = new DBConnection().Connect)
{
using (MySqlCommand command = new MySqlCommand())
{
connection.Open();
command.Connection = connection;
//Get all the ids of the teams to add or update
command.CommandText = "SELECT tt.team_id " +
"FROM (SELECT {textToReplace}) tt " +
"LEFT JOIN team t on t.id = tt.team_id " +
"WHERE t.id IS NULL OR " +
"t.update_at < DATE_SUB(CURRENT_DATE, INTERVAL 7 DAY) OR t.update_at IS NULL";
I though to implement something like a switch statement for create 1 method that handle multiple types, but I don't know if there are better ways to handle this.
Could someone tell me if is possible optimize that and remove this redundancy?
The other two methods:
Team:
public List<Team> GetTeamsToAddUpdate(List<Team> teams)
{
List<Team> newTeams = new List<Team>();
using (MySqlConnection connection = new DBConnection().Connect)
{
using (MySqlCommand command = new MySqlCommand())
{
connection.Open();
command.Connection = connection;
command.CommandText = "SELECT tt.team_id " +
"FROM (SELECT {textToReplace}) tt " +
"LEFT JOIN team t on t.id = tt.team_id " +
"WHERE t.id IS NULL OR " +
"t.update_at < DATE_SUB(CURRENT_DATE, INTERVAL 7 DAY) OR t.update_at IS NULL";
StringBuilder unions = new StringBuilder();
List<MySqlParameter> prms = new List<MySqlParameter>();
MySqlParameter pr = new MySqlParameter("@first", MySqlDbType.UInt32) { Value = teams.First().Id };
prms.Add(pr);
unions.Append($" @first as team_id ");
int prmCounter = 1;
foreach (Team t in teams.Skip(1))
{
string placeholder = "@p" + prmCounter;
unions.Append($" UNION ALL SELECT {placeholder}");
pr = new MySqlParameter(placeholder, MySqlDbType.Int32) { Value = t.Id };
prms.Add(pr);
prmCounter++;
}
command.Parameters.AddRange(prms.ToArray());
command.CommandText = command.CommandText.Replace("{textToReplace}", unions.ToString());
using (MySqlDataReader reader = command.ExecuteReader())
{
while (reader.Read())
{
newTeams.Add(teams.FirstOrDefault(x => x.Id == Convert.ToInt32(reader["team_id"])));
}
}
return newTeams;
}
}
}
Player:
public List<Player> GetPlayersToAddUpdate(List<Player> players)
{
List<Player> newPlayers = new List<Player>();
using (MySqlConnection connection = new DBConnection().Connect)
{
using (MySqlCommand command = new MySqlCommand())
{
connection.Open();
command.Connection = connection;
command.CommandText = "SELECT pp.player_id " +
"FROM (SELECT {textToReplace}) pp " +
"LEFT JOIN player p on p.id = pp.player_id " +
"WHERE p.id IS NULL OR " +
"p.update_at < DATE_SUB(CURRENT_DATE, INTERVAL 7 DAY) OR p.update_at IS NULL";
StringBuilder unions = new StringBuilder();
List<MySqlParameter> prms = new List<MySqlParameter>();
MySqlParameter pr = new MySqlParameter("@first", MySqlDbType.UInt32) { Value = players.First().Id };
prms.Add(pr);
unions.Append($" @first as player_id ");
int prmCounter = 1;
foreach(Player p in players.Skip(1))
{
string placeholder = "@p" + prmCounter;
unions.Append($" UNION ALL SELECT {placeholder}");
pr = new MySqlParameter(placeholder, MySqlDbType.Int32) { Value = p.Id };
prms.Add(pr);
prmCounter++;
}
command.Parameters.AddRange(prms.ToArray());
command.CommandText = command.CommandText.Replace("{textToReplace}", unions.ToString());
using (MySqlDataReader reader = command.ExecuteReader())
{
while (reader.Read())
{
newPlayers.Add(players.FirstOrDefault(x => x.Id == Convert.ToInt32(reader["player_id"])));
}
}
return newPlayers;
}
}
}
2 Answers 2
new DBConnection().Connect
Just curious: from where have you this construct?
You should check the input:
if (seasons == null || seasons.Count == 0) return seasons; // or throw?
using (MySqlConnection connection = new MySqlConnection()) { using (MySqlCommand command = new MySqlCommand()) { connection.Open(); command.Connection = connection; //Get all the ids of the seasons to add or update command.CommandText = "SELECT cs.season_id " + "FROM (SELECT {textToReplace}) cs " + "LEFT JOIN competition_seasons s on s.id = cs.season_id " + "WHERE s.id IS NULL OR " + "s.update_at < DATE_SUB(CURRENT_DATE, INTERVAL 7 DAY) OR s.update_at IS NULL";
You should open the connection for a short a period as possible and build the command text using a StringBuilder
instead.
StringBuilder commandBuilder = new StringBuilder();
commandBuilder.AppendFormat(...);
...
// and the other initialization stuff
...
using (MySqlConnection connection = new MySqlConnection())
{
using (MySqlCommand command = new MySqlCommand())
{
connection.Open();
command.Connection = connection;
command.CommandText = commandBuilder.ToString();
using (MySqlDataReader reader = command.ExecuteReader())
{
...
And besides that, concatenating strings in the above manner is considered bad practice and is expensive, because you instantiate a lot more strings than you may think (9 in the example above, I think).
In order to generalize the method, you have to consider only three variables: the items in the list, the table name and the name of the id column. The table name and the name of the id column are just strings, so it is easy.
For the items in the list, you can let them derive from the same base class that holds the Id as property, or you could let them implement an interface like:
public interface IdHolder // Maybe a descriptive name, but some kind of ugly?
{
int Id { get; }
}
public class Team : IdHolder
{
public int Id { get; internal set; }
}
public class Season : IdHolder
{
public int Id { get; internal set; }
}
public class Player : IdHolder
{
public int Id { get; internal set; }
}
Doing so you can make a generic method with a signature like:
public IEnumerable<TItem> GetItemsToAddUpdate<TItem>(IEnumerable<TItem> items, string tableName, string idColumn) where TItem : IdHolder
and then the rest is just a question of readjusting the existing string creation in one of the methods, so it all in all could end up like something like this:
public IEnumerable<TItem> GetItemsToAddUpdate<TItem>(IEnumerable<TItem> items, string tableName, string idColumn) where TItem : IdHolder
{
if (items == null || !items.Any())
{
yield break;
}
StringBuilder commandBuilder = new StringBuilder();
commandBuilder.AppendFormat("SELECT idLst.{0} ", idColumn);
commandBuilder.Append("FROM (SELECT {textToReplace}) idLst ");
commandBuilder.AppendFormat("LEFT JOIN {0} x on x.id = idLst.{1} ", tableName, idColumn);
commandBuilder.Append("WHERE x.id IS NULL OR ");
commandBuilder.Append("x.update_at < DATE_SUB(CURRENT_DATE, INTERVAL 7 DAY) OR x.update_at IS NULL");
//Join all the available parameters.
StringBuilder unions = new StringBuilder();
//List of available parameters.
List<MySqlParameter> parameters = new List<MySqlParameter>();
//Add the first parameter.
MySqlParameter pr = new MySqlParameter("@first", MySqlDbType.UInt32) { Value = items.First().Id };
parameters.Add(pr);
unions.AppendFormat(" @first as {0} ", idColumn);
//Start from one 'cause first param already defined in query.
int parameterIndex = 1;
//Create the parameter for the derived table based on the available seasons
foreach (TItem item in items.Skip(1))
{
string placeholder = "@p" + parameterIndex;
unions.Append($" UNION ALL SELECT {placeholder}");
pr = new MySqlParameter(placeholder, MySqlDbType.Int32) { Value = item.Id };
parameters.Add(pr);
parameterIndex++;
}
using (MySqlConnection connection = new MySqlConnection("<ConnectionString?>"))
{
using (MySqlCommand command = new MySqlCommand())
{
connection.Open();
command.Connection = connection;
command.Parameters.AddRange(parameters.ToArray());
command.CommandText = commandBuilder.ToString().Replace("{textToReplace}", unions.ToString());
using (MySqlDataReader reader = command.ExecuteReader())
{
//Remove all the seasons that doesn't need update.
while (reader.Read())
{
TItem item = items.FirstOrDefault(x => x.Id == Convert.ToInt32(reader[idColumn]));
if (item != null)
yield return item;
}
}
}
}
}
You can then make dedicated methods for each object type as:
public IEnumerable<Team> GetTeamsToAddUpdate(IEnumerable<Team> teams)
{
return GetItemsToAddUpdate(teams, "team", "team_id");
}
If making the types implement a common interface or derive from the same base class is not an option, then you can provide a function as argument for extracting the id from each item:
public IEnumerable<TItem> GetItemsToAddUpdate<TItem>(IEnumerable<TItem> items, Func<TItem, int> idFetcher, string tableName, string idColumn)
where idFetcher
is called like:
MySqlParameter pr = new MySqlParameter("@first", MySqlDbType.UInt32) { Value = idFetcher(items.First()) };
and the method can be called as:
public IEnumerable<Team> GetTeamsToAddUpdate(IEnumerable<Team> teams)
{
return GetItemsToAddUpdate(teams, (team) => team.Id, "team", "team_id");
}
Disclaimer: I may have overlooked some minor details in the sql strings, that differs from one type to the other.
-
\$\begingroup\$ First of all, thanks for the help, just a question: I did:
teams = GetItemsToAddUpdate(teams, "team", "team_id");
whereteams
is:List<Team> teams = new List<Teams>();
. I got this error:You can't use the type 'SWP.Models.Team' as a parameter of type 'TItem' in the method or in the generic type 'GetItemsToAddUpdate <TItem> (IEnumerable <TItem>, string, string)'.
, I did something wrong? \$\endgroup\$sfarzoso– sfarzoso2019年03月20日 18:58:26 +00:00Commented Mar 20, 2019 at 18:58 -
1\$\begingroup\$ @sfarzoso: The
Team
class you use, does it implementIdHolder
? \$\endgroup\$user73941– user739412019年03月20日 19:45:21 +00:00Commented Mar 20, 2019 at 19:45 -
1\$\begingroup\$ @sfarzoso: Sorry, not at all, you can freely change the access specifier to support your needs (for instance
public
) \$\endgroup\$user73941– user739412019年03月20日 20:26:03 +00:00Commented Mar 20, 2019 at 20:26 -
1\$\begingroup\$ @sfarzoso: There is a quite good explanation with examples in the reference here... \$\endgroup\$user73941– user739412019年03月20日 20:51:06 +00:00Commented Mar 20, 2019 at 20:51
-
1\$\begingroup\$ Thanks for the help, I really appreciate that, have a good day :) \$\endgroup\$sfarzoso– sfarzoso2019年03月20日 20:53:35 +00:00Commented Mar 20, 2019 at 20:53
My first recommendation would be to refactor this query so that it's friendlier to dynamic construction. Perhaps something like
CREATE TEMPORARY TABLE RequestedIds (id INT);
INSERT INTO RequestedIds (id)
VALUES (@p0), (@p1), (@p2);
SELECT requested.id
FROM RequestedIds AS requested
LEFT JOIN player AS item ON item.id = requested.id
WHERE item.id IS NULL
OR item.update_at IS NULL
OR item.update_at < DATE_SUB(CURRENT_DATE, INTERVAL 7 DAY);
Now the only indicator here that we're looking at players (and not seasons or teams) is the table name. Also, the use of a temp table means the generated list of parameters is isolated. (@p0), (@p1), (@p2)
is more straightforward to build than SELECT @first as id UNION ALL SELECT @p1 UNION ALL SELECT @p2
.
Now the command text is buildable with something like this:
private string BuildCommandText(IEnumerable<MySqlParameter> parameters, string tableName)
{
var paramNames = parameters.Select(param => $"({param.ParameterName})");
return string.Join(Environment.NewLine,
"CREATE TEMPORARY TABLE RequestedIds (id INT);",
"INSERT INTO RequestedIds (id)",
$"VALUES {string.Join(", ", paramNames)};",
"",
"SELECT requested.id",
"FROM RequestedIds AS requested",
$"LEFT JOIN {tableName} AS item ON item.id = requested.id",
"WHERE item.id IS NULL",
"OR item.update_at IS NULL",
"OR item.update_at < DATE_SUB(CURRENT_DATE, INTERVAL 7 DAY);");
}
As a side note, I usually prefer string.Join
over StringBuilder
. The performance is comparable, and it feels more declarative: "Join these string with newlines" vs "Start with this string. Now add this string. Now add this string..."
As for the parameters themselves, you can map a collection of IDs to a collection of parameters with a helper function like this:
var parameters = players.Select(player => player.Id).Select(BuildIdParameter);
// ...
private MySqlParameter BuildIdParameter(int value, int index)
{
return new MySqlParameter($"p{index}", MySqlDbType.Int32) { Value = value };
}
This technique takes advantage of the handy automatic index parameter provided by this overload of Select. I recommend doing this in conjunction with the generalization advice from Henrik's answer.
My last piece of advice is for after you've done the query. If you hide the actual execution of the query in a function, like this (braces ommitted for brevity)
private IEnumerable<int> GetIdsToAddOrUpdate(...)
{
using (var connection...)
using (var command...)
using (var reader...)
while (reader.read())
yield return Convert.ToInt32(reader["id"]);
}
Then you can skip the .FirstOrDefault
and the null check, and write
public List<TItem> GetItemsToAddOrUpdate<TItem>(List<TItem> items) where TItem : IDHolder
{
var ids = GetIdsToAddOrUpdate(...).ToHashSet();
return items.Where(item => ids.Contains(item.Id)).ToList();
}
Now, this is more efficient for large numbers of players/teams/seasons: it is O(n), while calling .FirstOrDefault
in a loop gives you O(n^2). However, you're unlikely to notice the difference unless n
is large indeed.
The real reason I like it isn't the efficiency, but because it's more declarative: "The players that need to be updated are all the players with IDs that need to be updated".
-
\$\begingroup\$ Isn't a temp table creation slower than a direct query execution? \$\endgroup\$sfarzoso– sfarzoso2019年03月22日 17:53:07 +00:00Commented Mar 22, 2019 at 17:53
-
1\$\begingroup\$ It may be. Honestly, I have no idea. You could try it out and see if you notice performance issues. You could adapt this command-building approach to work with an inline UNION query. Or you could ignore that advice entirely - this is just my opinion on how to make the code prettier :) \$\endgroup\$benj2240– benj22402019年03月22日 17:57:45 +00:00Commented Mar 22, 2019 at 17:57
-
\$\begingroup\$ I appreciate your help, thanks, maybe you could look at that question: stackoverflow.com/questions/55247116/… I started a bounty \$\endgroup\$sfarzoso– sfarzoso2019年03月22日 18:18:01 +00:00Commented Mar 22, 2019 at 18:18