Again, out of fun on Saturday night, I decided to solve the following problem from CodeEval.
The challenge is finding the index of the lowest unique number.
There is a game where each player picks a number from 1 to 9, writes it on a paper and gives to a guide. A player wins if his number is the lowest unique. We may have 10-20 players in our game.
Input sample:
Your program should accept as its first argument a path to a filename.
You're a guide and you're given a set of numbers from players for the round of game. E.g. 2 rounds of the game look this way:
3 3 9 1 6 5 8 1 5 3 9 2 9 9 1 8 8 8 2 1 1
Output sample:
Print a winner's position or 0 in case there is no winner. In the first line of input sample the lowest unique number is 6. So player 5 wins.
5 0
Suppose you're given a set of numbers from players for the round of game. E.g. 2 rounds of the game look this way:
3 3 9 1 6 5 8 1 5 3 9 2 9 9 1 8 8 8 2 1 1
The output is the winner's position, or 0 in case there is no winner. In the first line of input sample the lowest unique number is 6, so player 5 wins (it was player number 5 who threw number 6). And in the second set there is no unique number, so the output will be 0.
So, the overall output will be:
5 0
I solved this problem correctly using Linq in the following code, but the performance is not that great at 255 ms for about 100 data points. Any suggestion for improving performance or perhaps another novel solution? The data have to be read from a text file in CodeEval, so my code is structured around that way.
static void Main(string[] args)
{
using (StreamReader reader = File.OpenText(args[0]))
{
while (!reader.EndOfStream)
{
var linevalue = reader.ReadLine();
if (!string.IsNullOrEmpty(linevalue))
{
var numbers = linevalue.Split(' ');
var groups = numbers.GroupBy(x => x).Select(g => new
{
number = g.Key,
freq = g.Count()
});
var nonRepeated = groups.Where(x => x.freq == 1).ToList();
if (nonRepeated.Count() != 0)
{
var singles = nonRepeated.Select(x => x.number).ToList();
var smallest = singles.Min();
var playerNumber = Array.IndexOf(numbers, smallest);
Console.WriteLine(playerNumber + 1);
}
else
{
Console.WriteLine("0");
}
}
}
}
}
-
\$\begingroup\$ The question is written in a very confusing way. \$\endgroup\$ANeves– ANeves2014年10月06日 09:33:10 +00:00Commented Oct 6, 2014 at 9:33
-
\$\begingroup\$ Do you need to use LINQ? \$\endgroup\$IEatBagels– IEatBagels2014年10月06日 13:34:37 +00:00Commented Oct 6, 2014 at 13:34
-
\$\begingroup\$ @TopinFrassi You are allowed to use .Net 4.0, so LINQ is allowed. \$\endgroup\$Payam– Payam2014年10月07日 00:27:04 +00:00Commented Oct 7, 2014 at 0:27
2 Answers 2
I threw your code through a profiler and it told me that ToList
was responsible for quite a bit of your execution time. So why is that?
You have 2 ToList
calls in your code::
var nonRepeated = groups.Where(x => x.freq == 1).ToList();
if (nonRepeated.Count() != 0)
{
var singles = nonRepeated.Select(x => x.number).ToList();
var smallest = singles.Min();
So what happens here? You start by iterating through groups
to find those with freq
equal to 1. Count
is optimized to lists, so that one should be constant time. Then you iterate through nonRepeated
to select number
and then finally you iterate through it to select the minimum value. That's a lot of iterations. Let's try to cut that down:
var nonRepeated = groups.Where(x => x.freq == 1)
.Min(x => x.number);
if (nonRepeated != null)
{
var playerNumber = Array.IndexOf(numbers, nonRepeated);
By comparing the execution time of this and the other one (on my system) it seems we have achieved a small improvement. Not enough to brag about though. Since we don't have any ToList
calls anymore, I profiled the code once again. This time it is Min
that takes up the time (not really surprising - It is forcing the GroupBy
, Select
and Where
to be evaluated). So our bottleneck is (still) the LINQ queries. Can we speed it up?
Hint: LINQ usually have a slight overhead compared to doing things 'manually'.
Let's try to do these things ourselves, instead of relying on LINQ. Let's replace that GroupBy
and Select
stuff:
var values = new Dictionary<string, int>();
foreach (var number in numbers)
{
if (values.ContainsKey(number)) //No curly-brackets to keep it short.
values[number]++;
else
values.Add(number, 1);
}
A benchmark of the code told me that was a clear improvement. Nice! But the profiler still says that Where
and Min
are expensive. Let's do those manually too.
string minValue = null;
foreach (var value in values)
{
if (value.Value == 1)
{
if (minValue == null || String.Compare(value.Key, minValue, StringComparison.InvariantCulture) == -1)
minValue = value.Key;
}
}
if (minValue != null)
...
Once again, it seems we got an overall improvement in performance. Interestingly now the bottlenecks seem to be WriteLine
and OpenText
. We could try to experiment with other ways of reading from file/writing to console, but I don't really feel like doing that right now (and I don't think the 'easy' methods like File.ReadAllLines
and File.ReadLines
is much faster than your way, if at all). You could try to play with StringBuilder
and only write to console once, but in my experience it is rarely an improvement when the method is only executed once.
But maybe you can do some other things to squeeze just a little bit more out of your method.
First of all, is the IsNullOrEmpty
really necessary? It's a safety mechanism and in real life we would probably want to have it. But this is a coding exercise and we are all about performance right now.
Another thing you might want to consider is that strings are expensive. As it is, you are currently working with all your numbers as strings. Split
creates a bunch of new strings and every time we compare values we do a string comparison. It might be worth to convert your values to int
or char
instead.
HINT: If you try to convert to int values, it can be done by using char ASCII values. You know that the values will always be in the range [1-9]. The ASCII value for '1' is 49, '2' is 50, etc. So if you do
value[0] - 48
, you can convert you strings to ints with very little overhead.
When you want to optimize something it is always a good idea to measure and profile it and see just where the time is spent. Sometimes you can get big improvements with very few changes.. But you can also make things worse with very few changes.
After the changes mentioned in this post (except the string to int conversion) is applied, the code will look something like this:
using (StreamReader reader = File.OpenText(args[0]))
{
while (!reader.EndOfStream)
{
var linevalue = reader.ReadLine();
var numbers = linevalue.Split(' ');
var values = new Dictionary<string, int>();
string minValue = null;
foreach (var number in numbers)
{
if (values.ContainsKey(number))
values[number]++;
else
values.Add(number, 1);
}
foreach (var value in values)
{
if (value.Value == 1)
{
if (minValue == null || String.Compare(value.Key, minValue, StringComparison.InvariantCulture) == -1)
minValue = value.Key;
}
}
if (minValue != null)
{
var playerNumber = Array.IndexOf(numbers, minValue);
Console.WriteLine(playerNumber + 1);
}
else
{
Console.WriteLine("0");
}
}
}
As you can see, you have 2 loops within the while now. It might be possible to merge them into one by doing something clever.
I tried the code on CodeEval and got a 181ms execution time. That is a 74 ms improvement. To motivate you a bit, I know it can be done (at least a little) faster. ;)
-
\$\begingroup\$ Thank you for spending time and writing a good answer. I did put the code in the CodeEval as well, it ran at 213 ms, but of course there is a decent improvement. \$\endgroup\$Payam– Payam2014年10月07日 01:34:42 +00:00Commented Oct 7, 2014 at 1:34
-
\$\begingroup\$ @user843681 There can be a little luck involved with at managed language like C#. The same solution can sometimes have different performance results, simply because we do not control the runtime (that is why, when benchmarking, we typically run the code multiple times). It does seem like a big difference though and I don't know what the reason is. \$\endgroup\$MAV– MAV2014年10月07日 17:48:55 +00:00Commented Oct 7, 2014 at 17:48
Using LINQ might not be the best solution for something relatively simple, like this. Your code, for instance, involves a lot of iterating through different collections. Using a SortedDictionary to keep track of the unique numbers, the lowest being the first one, and a list to keep track of which numbers have been considered, greatly cuts down the number of iterations:
static void Main(string[] args)
{
using (StreamReader reader = File.OpenText(args[0]))
{
while (!reader.EndOfStream)
{
var numbers = reader.ReadLine().Split(' ').Select<string, int>(x => int.Parse(x)).ToArray();
int numcount = numbers.Length;
if (numcount > 0)
{
SortedDictionary<int, int> freqtable = new SortedDictionary<int, int>();
var usedints = Enumerable.Range(0, 10).Select(x => x = 0).ToList();
for (int i = 0; i < numcount; i++)
{
if (freqtable.ContainsKey(numbers[i]))
{
freqtable.Remove(numbers[i]);
usedints[numbers[i]] = i;
}
else
{
if (usedints[numbers[i]] == 0)
freqtable.Add(numbers[i], i + 1);
}
}
if (freqtable.Count > 0)
Console.WriteLine(freqtable.First().Value);
else
Console.WriteLine(0);
}
}
}
}
EDIT: Re-worked the solution. Did some more thinking on it and got the time down to 151 ms. Changed the stream to binary, the loop to only read 2 bytes at a time and the collections to be arrays. This cut down on time and memory:
using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
namespace LowestUniqueNum_cs
{
class Program
{
static void Main(string[] args)
{
using (BinaryReader reader = new BinaryReader(File.OpenRead(args[0])))
{
while (reader.PeekChar() > 0)
{
KeyValuePair<int, int>[] minvalue = new KeyValuePair<int,int>[10];
int[] usedints = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
int number = 0;
for (int i = 1; ; i++)
{
byte[] temp = reader.ReadBytes(2);
number = temp[0] - 48;
if (usedints[number] == 0)
{
minvalue[number] = new KeyValuePair<int, int>(number, i);
usedints[number] = number;
}
else if(minvalue[number].Key != 0)
{
minvalue[number] = new KeyValuePair<int, int>();
}
if (temp.Length != 2 || temp[1] != 32)
{
break;
}
}
Console.WriteLine(minvalue.FirstOrDefault(x => x.Key > 0).Value);
}
}
}
}
}
-
\$\begingroup\$ Your solution takes about 266 ms for the same amount of data input, it just has a slightly better memory usage than my solution. There must be a better and faster solution. \$\endgroup\$Payam– Payam2014年10月05日 16:07:42 +00:00Commented Oct 5, 2014 at 16:07
-
\$\begingroup\$
Select(x => x = 0)
What is this trying to do? Because it's equivalent toSelect(x => 0)
and I doubt that was the intention. \$\endgroup\$svick– svick2014年10月05日 17:38:12 +00:00Commented Oct 5, 2014 at 17:38 -
\$\begingroup\$ It sets the list to default values \$\endgroup\$user33306– user333062014年10月05日 23:53:24 +00:00Commented Oct 5, 2014 at 23:53
-
\$\begingroup\$ There is no need for the last
Clear
and reassignment ofusedints
since you recreate both in the next iteration of the loop. \$\endgroup\$MAV– MAV2014年10月06日 00:10:10 +00:00Commented Oct 6, 2014 at 0:10 -
\$\begingroup\$ @user843681 - Made some revisions that might interest you. \$\endgroup\$user33306– user333062014年10月06日 02:42:17 +00:00Commented Oct 6, 2014 at 2:42