I have a data structure that uses a Dictionary<ID,TimeSeries>
to hold time series for different products. TimeSeries
is implemented as something holding an array of Point
, which contain a DateTime
and a double
.
Its interface is:
TimeSeries{
int Length{ get; }
Point ElementAt(int index);
DateTime StartTime { get; }
DateTime EndTime { get; }
}
Given a pair of elements of such data structure I need to find the first time instant where they diverge. For my definition of divergence, if a dictionary contains an element and the other doesn't I need to return the time of the first point of the TimeSeries
I have.
This is the code I'm using to compute the difference and that I'd like you to review and comment:
bool TryFindFirstDivergenceTime(Dictionary<ID, TimeSeries> baseline, Dictionary<ID, TimeSeries> other, out DateTime firstDivergenceTime)
{
firstDivergenceTime = default(DateTime);
foreach(var baselineIDSeriesPair in baseline)
{
if(other.ContainsKey(baselineIDSeriesPair.Key))
{
firstDivergenceTime = Min(FindFirstDivergenceTime(baselineIDSeriesPair.Value, other[baselineIDSeriesPair.Key]), firstDivergenceTime);
}
else
{
firstDivergenceTime = Min(baselineIDSeriesPair.Value.Start, firstDivergenceTime);
}
}
foreach(var otherIDSeriesPair in other)
{
if(!baseline.ContainsKey(otherIDSeriesPair.Key))
{
firstDivergenceTime = Min(otherIDSeriesPair.Value.Start, firstDivergenceTime);
}
}
return firstDivergenceTime != default(DateTime);
}
public DateTime Min(DateTime first, DateTime second)
{
if(first == default(DateTime))
return second;
if(second == default(DateTime))
return first;
return new DateTime(Math.Min(first.Ticks, second.Ticks));
}
DateTime FindFirstDivergenceTime(TimeSeries first, TimeSeries second)
{
if(first.Length != second.Length)
{
throw new ArgumentException();
}
for(var i=0; i < first.Length; i++)
{
if(! first.ElementAt(i).Equals(second.ElementAt(i)))
{
return Min(first.ElementAt(i).Time, second.ElementAt(i).Time);
}
}
return default(DateTime);
}
I have the feeling that I am not approaching the problem in the right way as the code contains the repetition of the logic to find the time of the first difference when I don't have the same product in both the dictionaries.
The other think that I don't particularly like is that I find myself breaking th symmetry and using for on dictionary Key
and Value
from its IEnumerable
, while from the other I need to perform a lookup.
How would you address those issues?
1 Answer 1
Before getting to your question about reducing the duplication, there's another issue I'd like to point out first. The use of default(DateTime)
as a check for the absence of a value is incorrect and misleading. default(DateTime)
is a valid date. However unlikely it is that you will have a date that equals that value, it is misleading to use because anyone looking at your code that sees a DateTime
variable will assume that it holds a proper date value because it has to. As a struct, it can't be represented as null
, so it always holds a value. Enter nullable types. DateTime?
clearly expresses that the variable can be in a state where it doesn't hold a date value.
That said, it appears you're trying to join the two dictionary and compare in these ways:
- When there are matching keys.
- When only one of the dictionaries holds a key.
Sounds like a Full Outer Join, which someone has kindly shared a Linq-esque extension method for over on Stack Overflow. Using this, you can concisely and expressively rewrite your methods like so to avoid duplication (note the use of DateTime?
):
// Note: I think this is a bad method name (it's the same as the one below,
// which I think is better suited for the name). It should probably express
// the comparison between these Dictionary maps.
public static DateTime? GetEarliestDivergenceTime(Dictionary<int, TimeSeries> baseline, Dictionary<int, TimeSeries> other, DateTime? defaultEarliestDivergenceTime = null)
{
// Omitting 'FullOuterJoin' code in StackOverflow link:
var earliestDivergenceTimes = baseline.FullOuterJoin(
other,
b => b.Key,
o => o.Key,
(b,o) => GetEarliestDivergenceTimeOrDefault(b, o));
return earliestDivergenceTimes.Min() ?? defaultEarliestDivergenceTime;
}
public static DateTime? GetEarliestDivergenceTime(TimeSeries ts1, TimeSeries ts2, DateTime? defaultEarliestDivergenceTime)
{
DateTime? earliestDivergenceTime = null;
if(ts1 != null && ts2 != null)
{
// Compare.
}
else if(ts1 != null && ts2 == null)
{
// Use ts1.
}
else if(ts1 == null && ts2 != null)
{
// Use ts2.
}
// earliestDivergenceTime will be null if both t1 and t2 are null.
return earliestDivergenceTime ?? defaultEarliestDivergenceTime;
}
-
1\$\begingroup\$ I agree, although as you can see in my code comment I had a hard time with method/argument names since I'm still not 100% on how this would get used. Suggestions for improvement are welcome. \$\endgroup\$Ocelot20– Ocelot202014年04月25日 14:22:35 +00:00Commented Apr 25, 2014 at 14:22
Min
andFindFirstDivergenceTime
, so IMO they are quite relevant. Also, this doesn't look like it would compile (see duplicateStartTime
definitions inTimeSeries
and first use ofotherIDSeriesPair
in theTryFindFirstDivergenceTime
method. \$\endgroup\$Min
andFindFirstDivergenceTime
method implementations \$\endgroup\$first.Zip(second, (a, b) => { . . . })
of some kind instead of the foreaches. That does require the order to be the same, though. \$\endgroup\$