I have some code to accomplish the objective of finding out which user has made the most progress. Progress is measured in integer values, the higher the better. SubmittedAnswers
consist of among others a int UserId
and a int Progress
.
The answers
variable consist of a list of SubmittedAnswer
s as described above. To find out which user has made the most progress of this list I have the following algorithm:
/// <summary>
/// Method used to find out which user had made the most progress in a given time range.
/// </summary>
/// <param name="timeRange"></param>
/// <returns></returns>
public async Task<User> MostProgressAsync(ITimeRange timeRange)
{
IList<SubmittedAnswer> answers = await GetAnswersInRange(timeRange);
IEnumerable<IGrouping<int, SubmittedAnswer>> answersOfUsers = answers.GroupBy(a => a.UserId);
int maxProgress = int.MinValue;
var maxProgressUserId = 0;
foreach (IGrouping<int, SubmittedAnswer> answersOfUser in answersOfUsers)
{
int progress = answersOfUser.Sum(a => a.Progress);
if (progress > maxProgress)
{
maxProgress = progress;
maxProgressUserId = answersOfUser.Key;
}
}
return new User { UserId = maxProgressUserId };
}
SubmittedAnswers
and User
looks as follows:
public class SubmittedAnswer
{
public ObjectId Id { get; set; }
public int SubmittedAnswerId { get; set; }
public BsonDateTime SubmitDateTime { get; set; }
public bool Correct { get; set; }
public int Progress { get; set; }
}
public class User
{
public int UserId { get; set; }
}
Is this a proper way to accomplish this objective? Can it be improved?
Personally I would like to solve the issue a bit more functional, without declaring the variables above a foreach loop.
1 Answer 1
The method's name can be improved. I like that you ended it with Async
, it makes it compliant with the convention for async Task<T>
methods - however there's another naming convention programmers use all the time: method names should start with a verb! FindUserWithMostProgressAsync
might be a better name, unless that method is located in the User
class itself (then it would be misplaced).
IList<SubmittedAnswer> answers = await GetAnswersInRange(timeRange);
This tells me GetAnswersInRange
is an async
function that isn't following the convention - should be GetAnswersInRangeAsync
, but then it might be a better idea to just call it GetAnswersAsync
and let overloading speak the "in range" part: parameterless returns all answers, overload taking an ITimeRange
returns all answers within the specified time range - and that can be nicely documented with XML comments.
Now, returning an IList<SubmittedAnswer>
when all you really need is to iterate the results, is bad communication: IEnumerable<SubmittedAnswer>
would have worked better.
IEnumerable<IGrouping<int, SubmittedAnswer>> answersOfUsers = answers.GroupBy(a => a.UserId);
LINQ came with C# 3.5, and that version introduced a little wonder to avoid declaring obscure types such as an IEnumerable<IGrouping<int, SubmittedAnswer>>
- implicit typing changes strictly nothing to how the code works, saves you a bunch of typing, and makes shorter, more readable code here:
var answersOfUsers = answers.GroupBy(a => a.UserId);
Explicit typing is actually quite annoying here:
foreach (IGrouping<int, SubmittedAnswer> answersOfUser in answersOfUsers)
Compare to:
foreach (var answersOfUser in answersOfUsers)
There's a typo there that becomes immediately apparent: the loop variable is named exactly the same thing at the collection that's being iterated - that code wouldn't compile... so I'll assume you meant to have this:
foreach (var answerOfUser in answersOfUsers)
Now now now... you have a grouping, and you're entering a foreach
loop, only to filter some more - that foreach
loop could be converted to a LINQ query.
So we have all the answers in the time range we're interested in:
var answers = await GetAnswersInRange(timeRange);
And what we're after, is the UserId
for the user that has accumulated the most progress with these answers
. Grouping by UserId
was a good first step.
var userAnswers = answers.GroupBy(answer => answer.UserId);
Next, we want to know how much progress each user has - we can do this by projecting a selection into an anonymous type, with the Select
method:
var userProgress = userAnswers.Select(userAnswer =>
new { UserId = userAnswer.Key, Progress = userAnswer.Sum(answer => answer.Progress) });
And this is where var
has a clear advantage over explicit typing - exactly what is the type of userProgress
here? It's an IQueryable<T>
, where T
is an anonymous type with some int UserId
and int Progress
members: we don't need to care what the actual type is.
What's next? Well, what we have now could be visualized as a table with a UserId
and a Progess
column. The next step is to find out which user has the most progess, and we can sort the results to get that, in descending order so we have the top user, well, at the top:
var sortedUserProgress = userProgress.OrderByDescending(progress => progress.Progress);
And then, all we need to do is grab the first "row":
var topUserId = sortedUserProgress.First().UserId;
And then we can create the User
object for the return value:
return new User { UserId = topUserId };
...That's all nice, but it's excessively verbose:
var answers = await GetAnswersInRange(timeRange);
var userAnswers = answers.GroupBy(answer => answer.UserId);
var userProgress = userAnswers.Select(userAnswer =>
new { UserId = userAnswer.Key, Progress = userAnswer.Sum(answer => answer.Progress) });
var sortedUserProgress = userProgress.OrderByDescending(progress => progress.Progress);
var topUserId = sortedUserProgress.First().UserId;
The good news is that LINQ is awesome, and lets you chain all these methods, so you can actually do this:
var answers = await GetAnswersInRange(timeRange);
var userId = answers.GroupBy(answer => answer.UserId)
.Select(userAnswer => new
{
UserId = userAnswer.Key,
Progress = userAnswer.Sum(answer => answer.Progress)
})
.OrderByDescending(progress => progress.Progress)
.First().UserId;
Note that .First()
will throw an exception if there is no element in the sorted results. You could use .FirstOrDefault()
instead, but then the .UserId
member call would have to go on a separate instruction, because FirstOrDefault()
will return null
when there's no "first element", which means you'd be calling .UserId
on a null reference, which would obviously throw a NullReferenceException
... and your current code returns a valid User
object with UserId
set to 0
, which leads me to believe the rest of your code has checks that verify whether a UserId
is zero or not... which is rather ugly. How about this instead?
var answers = await GetAnswersInRange(timeRange);
var result = answers.GroupBy(answer => answer.UserId)
.Select(userAnswer => new
{
UserId = userAnswer.Key,
Progress = userAnswer.Sum(answer => answer.Progress)
})
.OrderByDescending(progress => progress.Progress)
.FirstOrDefault();
return result == null
? null
: new User { UserId = result.UserId };
Returning null
when there's no data for the specified timeRange
makes more sense to me than returning a User
object with some meaningless default/hard-coded values.