DataTable runtime grouping on user defined criteria

Question 1

I one of my tools I use the user can define grouping for a DataTable. The criteria are known only at run-time. To achieve this I use a Dictionary<string, object> with a custom dictionary comparer that the GroupBy extension cosumes so that I can aggregate the groups. This works pretty well but I was wondering if there is still room for improvement.

The number and names of columns can change. There is nothing known about the DataTable until a JSON configuration is loaded. This is an example of such a configuration from one of my unit tests.

"Columns": [
 "_nvarchar | key",
 "_datetime2",
 "_int | sum",
 "_float | avg",
 "_bit | count",
 "_money",
 "_numeric"
],

(The configuration is already being taken care of and is not a part of the review, it's just for reference to show how the columns can be configured.)

To make it simpler here's some example data:

var dataTable = new DataTable();
dataTable.Columns.Add("FirstName", typeof(string));
dataTable.Columns.Add("LastName", typeof(string));
dataTable.Columns.Add("ItemCount", typeof(int));
dataTable.Rows.Add("foo", "bar", 2);
dataTable.Rows.Add("foo", "bar", 5);
dataTable.Rows.Add("baz", "qux", 3);
dataTable.Rows.Add("foo", "bar", 2);
dataTable.Rows.Add("baz", "qux", 6);

The keyColumns are those marked with key - they are not the same as database keys and they are for aggregation later.

var keyColumns = new[] { "FirstName", "LastName" };

The implementation is very simple. Just enumerate the rows and put the name of the key column in the dictionary's key and the value of that column as the value of that dictionary item.

This is exactly the implementation I currently use:

var rowGroups = dataTable.AsEnumerable().GroupBy(x =>
 keyColumns.ToDictionary(
 column => column,
 column => x[column],
 StringComparer.OrdinalIgnoreCase
 ),
 new DictionaryComparer<string, object>()
).ToList();

To actually be able to group the dictionaries there is also this comparer:

public class DictionaryComparer<TKey, TValue> : IEqualityComparer<IDictionary<TKey, TValue>>
{
 public bool Equals(IDictionary<TKey, TValue> x, IDictionary<TKey, TValue> y)
 {
 return x.All(item => y.TryGetValue(item.Key, out TValue value) && item.Value.Equals(value));
 }
 // Ignore dictionary hash-code and just compare the keys and values.
 public int GetHashCode(IDictionary<TKey, TValue> obj) => 0;
}

And the result is:

dic-gorup-result

Question 2

Honestly, I am not sure if a dictionary is the right data structure for realizing composed keys. At least for me it is not directly obviously how the data are grouped. I think the main issue that confuses me is, that the grouping key contains not just the values to group by, but also the column headers which is absolutely irrelevant for grouping. Additionally I have to understand the comparer.

For your example, you could easily use a tuple which provides build-in comparison of each item:

var rowGroups = dataTable.AsEnumerable()
 .GroupBy(x => Tuple.Create(x["FirstName"], x["LastName"]))
 .ToList();

Another more flexible approach could be to create a string value to be used as grouping key:

var rowGroups = dataTable.AsEnumerable()
 .GroupBy(x => string.Join(":", keyColumns.Select(key => x[key])))
 .ToList();

Question 3

The column name for the group seems really to be a stupid idea... I'm wondering why did I think I need it. I'll throw it away but I'll have to keep the comparer and instead it'll use the SequenceEqual for two collections to determine if the keys are same because the columns can by of any type so it needs to use Equals internally.

Question 4

Do you mean something like .GroupBy(x => keyColumns.Select(key => x[key]).ToArray()) + custom comparer using SequenceEqual?

Question 5

Exactly, I've just tested it and it works great. Now I'm also able to add a proper hash-code for such a collection.

Question 6

Sounds good. Another option could be to create a class like ComposableKey, getting a list of objects with custom Equals / GetHashCode implemenation. That is probably more descriptive in usage but less flexible compared to a comparer.

Question 7

This is what I ended up with. I removed the meaningless dictionary and replaced it with a CompositeKey. I need additional processing for the values thus the additional let to make the lines more compact.

var rowGroups =
 from dataRow in dataRows
 let values = keyColumns.Select(column => column.Filter.Apply(dataRow[column.Name]))
 group dataRow by new CompositeKey<object>(values) into g
 select g;

I like the suggestion with the CompositeKey so I move the loging from the comparer into this one. It now uses SequenceEqual to compare the keys and has a proper GetHashCode implementation (I hope so).

public class CompositeKey<T> : IEnumerable<T>, IEquatable<CompositeKey<T>>
{
 private readonly List<T> _keys;
 public CompositeKey(IEnumerable<T> keys) => _keys = new List<T>(keys);
 public IEnumerator<T> GetEnumerator() => _keys.GetEnumerator();
 IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
 public bool Equals(CompositeKey<T> other)
 {
 if (ReferenceEquals(other, null)) return false;
 return ReferenceEquals(this, other) || this.SequenceEqual(other);
 }
 public override bool Equals(object obj)
 {
 if (ReferenceEquals(null, obj)) return false;
 if (ReferenceEquals(this, obj)) return true;
 if (obj.GetType() != this.GetType()) return false;
 return Equals((CompositeKey<T>)obj);
 }
 public override int GetHashCode()
 {
 unchecked
 {
 return this.Aggregate(0, (current, next) => (current * 397) ^ next?.GetHashCode() ?? 0);
 }
 }
}

JanDotNet JanDotNet 8,5582 gold badges21 silver badges48 bronze badges · Accepted Answer · 2017-06-19 18:00:42Z

Honestly, I am not sure if a dictionary is the right data structure for realizing composed keys. At least for me it is not directly obviously how the data are grouped. I think the main issue that confuses me is, that the grouping key contains not just the values to group by, but also the column headers which is absolutely irrelevant for grouping. Additionally I have to understand the comparer.

For your example, you could easily use a tuple which provides build-in comparison of each item:

var rowGroups = dataTable.AsEnumerable()
 .GroupBy(x => Tuple.Create(x["FirstName"], x["LastName"]))
 .ToList();

Another more flexible approach could be to create a string value to be used as grouping key:

var rowGroups = dataTable.AsEnumerable()
 .GroupBy(x => string.Join(":", keyColumns.Select(key => x[key])))
 .ToList();

The column name for the group seems really to be a stupid idea... I'm wondering why did I think I need it. I'll throw it away but I'll have to keep the comparer and instead it'll use the SequenceEqual for two collections to determine if the keys are same because the columns can by of any type so it needs to use Equals internally.
Do you mean something like .GroupBy(x => keyColumns.Select(key => x[key]).ToArray()) + custom comparer using SequenceEqual?
Exactly, I've just tested it and it works great. Now I'm also able to add a proper hash-code for such a collection.
Sounds good. Another option could be to create a class like ComposableKey, getting a list of objects with custom Equals / GetHashCode implemenation. That is probably more descriptive in usage but less flexible compared to a comparer.

Stack Exchange Network

DataTable runtime grouping on user defined criteria

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

DataTable runtime grouping on user defined criteria

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions