I one of my tools I use the user can define grouping for a DataTable
. The criteria are known only at run-time. To achieve this I use a Dictionary<string, object>
with a custom dictionary comparer that the GroupBy
extension cosumes so that I can aggregate the groups. This works pretty well but I was wondering if there is still room for improvement.
The number and names of columns can change. There is nothing known about the DataTable
until a JSON configuration is loaded. This is an example of such a configuration from one of my unit tests.
"Columns": [
"_nvarchar | key",
"_datetime2",
"_int | sum",
"_float | avg",
"_bit | count",
"_money",
"_numeric"
],
(The configuration is already being taken care of and is not a part of the review, it's just for reference to show how the columns can be configured.)
To make it simpler here's some example data:
var dataTable = new DataTable();
dataTable.Columns.Add("FirstName", typeof(string));
dataTable.Columns.Add("LastName", typeof(string));
dataTable.Columns.Add("ItemCount", typeof(int));
dataTable.Rows.Add("foo", "bar", 2);
dataTable.Rows.Add("foo", "bar", 5);
dataTable.Rows.Add("baz", "qux", 3);
dataTable.Rows.Add("foo", "bar", 2);
dataTable.Rows.Add("baz", "qux", 6);
The keyColumns
are those marked with key
- they are not the same as database keys and they are for aggregation later.
var keyColumns = new[] { "FirstName", "LastName" };
The implementation is very simple. Just enumerate the rows and put the name of the key column in the dictionary's key and the value of that column as the value of that dictionary item.
This is exactly the implementation I currently use:
var rowGroups = dataTable.AsEnumerable().GroupBy(x =>
keyColumns.ToDictionary(
column => column,
column => x[column],
StringComparer.OrdinalIgnoreCase
),
new DictionaryComparer<string, object>()
).ToList();
To actually be able to group the dictionaries there is also this comparer:
public class DictionaryComparer<TKey, TValue> : IEqualityComparer<IDictionary<TKey, TValue>>
{
public bool Equals(IDictionary<TKey, TValue> x, IDictionary<TKey, TValue> y)
{
return x.All(item => y.TryGetValue(item.Key, out TValue value) && item.Value.Equals(value));
}
// Ignore dictionary hash-code and just compare the keys and values.
public int GetHashCode(IDictionary<TKey, TValue> obj) => 0;
}
And the result is:
2 Answers 2
Honestly, I am not sure if a dictionary is the right data structure for realizing composed keys. At least for me it is not directly obviously how the data are grouped. I think the main issue that confuses me is, that the grouping key contains not just the values to group by, but also the column headers which is absolutely irrelevant for grouping. Additionally I have to understand the comparer.
For your example, you could easily use a tuple which provides build-in comparison of each item:
var rowGroups = dataTable.AsEnumerable()
.GroupBy(x => Tuple.Create(x["FirstName"], x["LastName"]))
.ToList();
Another more flexible approach could be to create a string value to be used as grouping key:
var rowGroups = dataTable.AsEnumerable()
.GroupBy(x => string.Join(":", keyColumns.Select(key => x[key])))
.ToList();
-
\$\begingroup\$ The column name for the group seems really to be a stupid idea... I'm wondering why did I think I need it. I'll throw it away but I'll have to keep the comparer and instead it'll use the
SequenceEqual
for two collections to determine if the keys are same because the columns can by of any type so it needs to useEquals
internally. \$\endgroup\$t3chb0t– t3chb0t2017年06月19日 18:13:14 +00:00Commented Jun 19, 2017 at 18:13 -
\$\begingroup\$ Do you mean something like
.GroupBy(x => keyColumns.Select(key => x[key]).ToArray())
+ custom comparer usingSequenceEqual
? \$\endgroup\$JanDotNet– JanDotNet2017年06月19日 18:29:45 +00:00Commented Jun 19, 2017 at 18:29 -
\$\begingroup\$ Exactly, I've just tested it and it works great. Now I'm also able to add a proper hash-code for such a collection. \$\endgroup\$t3chb0t– t3chb0t2017年06月19日 18:31:48 +00:00Commented Jun 19, 2017 at 18:31
-
1\$\begingroup\$ Sounds good. Another option could be to create a class like
ComposableKey
, getting a list of objects with customEquals
/GetHashCode
implemenation. That is probably more descriptive in usage but less flexible compared to a comparer. \$\endgroup\$JanDotNet– JanDotNet2017年06月19日 18:41:34 +00:00Commented Jun 19, 2017 at 18:41
This is what I ended up with. I removed the meaningless dictionary and replaced it with a CompositeKey
. I need additional processing for the values thus the additional let
to make the lines more compact.
var rowGroups =
from dataRow in dataRows
let values = keyColumns.Select(column => column.Filter.Apply(dataRow[column.Name]))
group dataRow by new CompositeKey<object>(values) into g
select g;
I like the suggestion with the CompositeKey
so I move the loging from the comparer into this one. It now uses SequenceEqual
to compare the keys and has a proper GetHashCode
implementation (I hope so).
public class CompositeKey<T> : IEnumerable<T>, IEquatable<CompositeKey<T>>
{
private readonly List<T> _keys;
public CompositeKey(IEnumerable<T> keys) => _keys = new List<T>(keys);
public IEnumerator<T> GetEnumerator() => _keys.GetEnumerator();
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
public bool Equals(CompositeKey<T> other)
{
if (ReferenceEquals(other, null)) return false;
return ReferenceEquals(this, other) || this.SequenceEqual(other);
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != this.GetType()) return false;
return Equals((CompositeKey<T>)obj);
}
public override int GetHashCode()
{
unchecked
{
return this.Aggregate(0, (current, next) => (current * 397) ^ next?.GetHashCode() ?? 0);
}
}
}