I have a condition check logic, which seems to be taking more than \$\frac{1}{10}\$ of CPU time (millions and millions are such checks are performed). I don't think I can improve any other aspects of my solution, but at least this one may be something to be improved still.
Basically, I have a loop inside a loop inside a loop, .. and then 3 such checks inside:
if (!Combination.CheckCondition(dateCondition1, profit1))
continue;
Implementation:
public static bool CheckCondition(Condition condition, decimal profit)
{
var conditionType = condition.ConditionType;
if (conditionType == ConditionType.Disabled)
{
return true;
}
else if (conditionType == ConditionType.Between && profit >= condition.From && profit <= condition.To)
{
return true;
}
else if (conditionType == ConditionType.Bigger && profit >= condition.Exact)
{
return true;
}
else if (conditionType == ConditionType.Smaller && profit <= condition.Exact)
{
return true;
}
return false;
}
Condition
class:
public class Condition
{
public static int CurrentIndex = 0;
public int Id { get; set; }
public ConditionType ConditionType { get; set; }
public int From { get; set; }
public int To { get; set; }
public int Exact { get; set; }
public Condition()
{
this.Id = CurrentIndex;
CurrentIndex++;
}
}
public enum ConditionType
{
None,
Disabled,
Smaller,
Bigger,
Between
}
Overall loop picture:
Dictionary<(int,int), decimal> profits = new Dictionary<(int,int), decimal>();
// all profits are calculated beforehand
var dateConditions = new List<Condition>() // this is just a small part of what I really have
{
new Condition() { ConditionType = ConditionType.Disabled },
new Condition() { ConditionType = ConditionType.Smaller, Exact = -70 },
new Condition() { ConditionType = ConditionType.Bigger, Exact = 0 },
new Condition() { ConditionType = ConditionType.Between, From = -30, To = 30 }
};
Dictionary<(int, int, int, int, int, int, int, int, int), decimal> results = new Dictionary<(int, int, int, int, int, int, int, int, int), decimal>();
for (int date1End = 0 + 1; date1End < 10; date1End++) // Start with 0 (now), end with -24 hours, Step = 30 minutes
{
var tuple1 = (0, date1End);
decimal profit1 = profits[tuple1];
foreach (var dateCondition1 in dateConditions)
{
if (!Combination.CheckCondition(dateCondition1, profit1))
continue;
for (int date2End = 2 + 1; date2End < 9; date2End++) // start from -1 hour, end with -10 hours, step 1 hour
{
var tuple2 = (2, date2End);
decimal profit2 = profits[tuple2];
foreach (var dateCondition2 in dateConditions)
{
if (!Combination.CheckCondition(dateCondition2, profit2))
continue;
for (int date3End = 3 + 1; date3End < 8; date3End++) // start from -8 hours, end with -20 hours, step 1 hour
{
var tuple3 = (3, date3End);
decimal profit3 = profits[tuple3];
foreach (var dateCondition3 in dateConditions)
{
if (!Combination.CheckCondition(dateCondition3, profit3))
continue;
var calculatedAmount = 1234m;
var key = new (0, date1End, 2, date2End, 3, date3End, dateCondition1.Id, dateCondition2.Id, dateCondition3.Id);
if (!results.ContainsKey(key))
results.Add(key, calculatedAmount);
}
}
}
}
}
}
The aim of the program is to generate all possible code combinations for matching conditions for a given data. As I mentioned - this code is working perfectly fine, but there are so many checks performed in this one method, that I am starting to think of alternative solutions. I would appreciate advice on how to improve the performance of the mentioned logic.
Currently, it is taking more than 2 hours to run the code from start to finish. Even 20 minutes of improvement would be great.
Edit: Improvement 1. As was mentioned by Pieter Witvoet the first improvement for the code is within the check method itself:
public static bool CheckCondition(Condition condition, decimal profit)
{
var conditionType = condition.ConditionType;
if (conditionType == ConditionType.Disabled)
{
return true;
}
else if (conditionType == ConditionType.Between)
{
if (profit >= condition.From && profit <= condition.To)
{
return true;
}
}
else if (conditionType == ConditionType.Bigger)
{
if (profit >= condition.Exact)
{
return true;
}
}
else if (conditionType == ConditionType.Smaller)
{
if (profit <= condition.Exact)
{
return true;
}
}
return false;
}
Sub-conditions are taken out from outer check into a nested inner check. This should save some time.
Edit: Improvement 2. As was suggested by Pieter Witvoet - there is no need to recalculate data over and over again. Instead - it should be possible to generate all combinations before hand. If I understood that suggestion correctly - here is an improved version of logic! I can not test it unfortunately right now as database remote host is not responding, but in theory this should be a huge improvement to performance!
List<(int, int, int)> validCombinations1 = new List<(int, int, int)>();
for (int date1End = 0 + 1; date1End < 10; date1End++)
{
var tuple1 = (0, date1End);
decimal profit1 = profits[tuple1];
foreach (var dateCondition1 in dateConditions)
{
if (!Combination.CheckCondition(dateCondition1, profit1))
continue;
validCombinations1.Add((0, date1End, dateCondition1.Id));
}
}
List<(int, int, int)> validCombinations2 = new List<(int, int, int)>();
for (int date2End = 2 + 1; date2End < 9; date2End++)
{
var tuple2 = (2, date2End);
decimal profit2 = profits[tuple2];
foreach (var dateCondition2 in dateConditions)
{
if (!Combination.CheckCondition(dateCondition2, profit2))
continue;
validCombinations2.Add((2, date2End, dateCondition2.Id));
}
}
List<(int, int, int)> validCombinations3 = new List<(int, int, int)>();
for (int date3End = 3 + 1; date3End < 8; date3End++)
{
var tuple3 = (3, date3End);
decimal profit3 = profits[tuple3];
foreach (var dateCondition3 in dateConditions)
{
if (!Combination.CheckCondition(dateCondition3, profit3))
continue;
validCombinations3.Add((3, date3End, dateCondition3.Id));
}
}
foreach(var combination1 in validCombinations1)
{
foreach (var combination2 in validCombinations2)
{
foreach (var combination3 in validCombinations3)
{
var key = (combination1.Item1, combination1.Item2, combination2.Item1, combination2.Item2, combination3.Item1, combination3.Item2, combination1.Item3, combination2.Item3, combination3.Item3);
var calculatedAmount = 1234m;
if (!results.ContainsKey(key))
results.Add(key, calculatedAmount);
}
}
}
Edit: this will be my last update in this post, probably. I have managed to almost halve the execution time by getting rid of 3 constant values in ValueTuple
. Start dates were always constant. As the result - key lookup performance in dictionary skyrocketed.
After the key value update - then next worst performant place was in fact another part of the code, which is not even included in this question. Basically I used to update dictionary values for existing keys (running totals). I have used decimal
as a number type, however, as it turned out, double
is almost 15 times faster for all operations. I was able to switch to double
. In my case the precision is not as important so a perfectly nice decision.
As the result of 2 fixes provided by Pieter Witvoet and my own findings - the code is now performing lightning fast. I have to thank all participants for the input! At this point I am stuck with dictionary lookup being the slowest part, but it is not something that can be improved.
2 Answers 2
In CheckCondition
, you continue to check other condition types if the matching type fails. For example, if the condition type is Between
, but the given value does not fall within the condition's range, the code then checks if the type is maybe Bigger
or Smaller
, which it obviously won't be. Switching on condition type and then returning the result of the appropriate check should be slightly more efficient. On my system this ran about 5% to 10% faster.
However, the bigger problem is those deeply nested loops. In your example, profit1
is calculated 9 times. But due to nesting, profit2
is calculated 162 times, profit3
1944 times and the innermost code is executed 5832 times. Those numbers will get exponentially worse depending on how many conditions you have: doubling the number conditions will increase the amount of work eightfold. The catch is that none of those profits depend on each other, so there's no need to nest those loops at all! Just create 3 lists of matching condition IDs and associated dateEnd
values separately, and then store all combinations in the results
dictionary. This approach runs roughly twice as fast for me.
Another possible improvement is to make the method 'lazy', but whether that's a good idea depends on how this method is actually used. Instead of creating a dictionary, you can yield
results one-by-one, so they're only generated when they're actually needed.
-
1\$\begingroup\$ +1 for actually looking at these insanly nested loops and calculating their execution times :-] \$\endgroup\$t3chb0t– t3chb0t2017年12月24日 06:26:44 +00:00Commented Dec 24, 2017 at 6:26
-
\$\begingroup\$ Well, actually, all profits are being calculated beforehand.
profits
are represented byDictionary<(int, int), decimal>
. Calculating the profit is a matter of taking a value from this dictionary using theValueTuple
key. (which is a start and end dates). I will update my question in a second. Maybe I shouldn't have taken that part out.. I am now thinking about what you said about "generating 3 lists". I have to run some tests to understand if I understood your suggestion correctly. \$\endgroup\$Alex– Alex2017年12月24日 11:43:32 +00:00Commented Dec 24, 2017 at 11:43 -
\$\begingroup\$ @Pieter Witvoet, I would appreciate if you have taken a look at my 2 edits and said if I interpreted your suggestions correctly and anything is left still :) Thank you sir for taking your time to look into my issue! \$\endgroup\$Alex– Alex2017年12月24日 12:10:15 +00:00Commented Dec 24, 2017 at 12:10
-
1\$\begingroup\$ @Alex: that looks correct, yes. A small note: instead of writing
if (cond) return true; else return false;
, you can simply writereturn cond;
. \$\endgroup\$Pieter Witvoet– Pieter Witvoet2017年12月24日 13:11:35 +00:00Commented Dec 24, 2017 at 13:11 -
\$\begingroup\$ @PieterWitvoet note taken! I have done one more update to the post with 2 more improvements. I think that instead of 2 hours the code is now running 30 minutes or so. Unfortunately since the time this post was created some more data was added to DB so the amount of data that is processed is in fact larger now, so I can not measure exact improvement over initial set of data.. In any case - it is at least 2 times faster now. I even bet to say close to 4 times faster. \$\endgroup\$Alex– Alex2017年12月24日 13:43:33 +00:00Commented Dec 24, 2017 at 13:43
while looking at the implementation I couldn't help but think that this could be made more readable and efficient.
instead of this:
public static bool CheckCondition(Condition condition, decimal profit) { var conditionType = condition.ConditionType; if (conditionType == ConditionType.Disabled) { return true; } else if (conditionType == ConditionType.Between && profit >= condition.From && profit <= condition.To) { return true; } else if (conditionType == ConditionType.Bigger && profit >= condition.Exact) { return true; } else if (conditionType == ConditionType.Smaller && profit <= condition.Exact) { return true; } return false; }
I would want to return something from each if statement.
What I mean, is that if I know the condition type there is only one way to check the check. if the condition type is smaller, then it is not going to match any other condition type, and if the "smaller" boolean statement is false, then I should return false.
so the code should look like this.
public static bool CheckCondition(Condition condition, decimal profit)
{
var conditionType = condition.ConditionType;
if (conditionType == ConditionType.Disabled) {
return true;
}
else if (conditionType == ConditionType.Between)
{
return profit >= condition.From && profit <= condition.To;
}
else if (conditionType == ConditionType.Bigger)
{
return profit >= condition.Exact;
}
else if (conditionType == ConditionType.Smaller)
{
return profit <= condition.Exact;
}
return false;
}
I was looking at the for loops and the set up for the loops confused me, so I looked at the Comments for the setup and that confused me even more
for (int date1End = 0 + 1; date1End < 10; date1End++) // Start with 0 (now), end with -24 hours, Step = 30 minutes
if you are going to set date1End
to 1, set it to 1. but your comment says that you are starting at 0 and ending at -24 stepping by 30 minutes... ... ...
This:
int date1End = 0 + 1; date1End < 10; date1End++)
will never do that
date1End
will never be 0date1End
will never be negativedate1End
is not stepping by 30 minutes in any direction and cannot be a decimal in order to step by half an hour.- when I see date in the name of a variable, I do not think hours, let alone integers
O(n^2)
. You have 6 layers of loops in there. My tiny brain can’t even think about the exponential runtime of it. \$\endgroup\$