This is an algorithm to return a list of consecutive integer ranges from a list of integers.
The practical use of such an algorithm is to filter lists more efficiently, i.e. rather than check e.g. 1000 items (1..1000; x==1 || x==2 || ... || x==1000), it might be possible to check only 2 items (x>= 1 && x <= 1000).
Does this algorithm have any mistakes, can it be optimized, or are there any other improvements you can suggest?
(As a side note: I know this code does not follow the standard C# naming conventions. I do not like to follow that particular convention; I prefer "snake_case" as it is easier on my eyes.)
Sample output:
(x >= 0 && x <= 1690)
(x >= 13642 && x <= 15331)
(x >= 27283 && x <= 27296)
(x >= 27769 && x <= 27776)
(x >= 28249 && x <= 28256)
(x >= 28729 && x <= 28736)
(x >= 29209 && x <= 29222)
The algorithm (built Visual Studio 2017, C# 7.3, .NET 4.7.2):
public static List<(int from, int to)> get_consecutive_ranges(List<int> fids)
{
if (fids == null || fids.Count == 0) return null;
fids = fids.OrderBy(a => a).Distinct().ToList();
var fids_fast = new List<(int from, int to)>();
var is_conseq_with_last = true;
var start_index = 0;
var end_index = 0;
for (var fids_index = 0; fids_index < fids.Count; fids_index++)
{
var first = fids_index == 0;
var last = fids_index == fids.Count - 1;
if (!first && fids[fids_index - 1] == fids[fids_index] - 1)
{
is_conseq_with_last = true;
}
else if (!first)
{
is_conseq_with_last = false;
}
if (!is_conseq_with_last && !first && !last)
{
end_index = !first && !last ? fids_index - 1 : fids_index;
fids_fast.Add((fids[start_index], fids[end_index]));
start_index = fids_index;
}
else if (last)
{
if (!is_conseq_with_last)
{
if (!first)
{
end_index = fids_index - 1;
}
fids_fast.Add((fids[start_index], fids[end_index]));
start_index = fids_index;
}
end_index = fids_index;
fids_fast.Add((fids[start_index], fids[end_index]));
}
}
fids_fast.ForEach(a => Console.WriteLine($"(x >= {a.@from} && x <= {a.to})"));
return fids_fast;
}
Example use:
// slow:
body = body.Where(a => fids.Contains(a.fid)).ToList();
// fast:
body = body.Where(a => fids_fast.Any(x => a.fid >= x.from && a.fid <= x.to)).ToList();
3 Answers 3
Your code is really complicated. First, your code should be abstracted a little more. It is not specific to feature IDs, therefore the terminology should not use these words. The same algorithm can be used to select which pages to print from a document, therefore the variables should be just nums
and ranges
. To test your current code, I wrote:
[Test]
public void TestRanges()
{
Assert.AreEqual("", Str(Ranges(new List<int>())));
Assert.AreEqual("1", Str(Ranges(new List<int> { 1 })));
Assert.AreEqual("1-5", Str(Ranges(new List<int> { 1, 2, 3, 4, 5 })));
Assert.AreEqual("1-3, 5", Str(Ranges(new List<int> { 1, 2, 3, 5 })));
Assert.AreEqual("1, 3, 5-6", Str(Ranges(new List<int> { 1, 3, 5, 6 })));
}
I wrote a helper function Str
so that I don't have to construct a list of ranges for each test case:
public static string Str(List<(int from, int to)> ranges)
{
var parts = new List<string>();
foreach (var range in ranges) {
if (range.from == range.to) {
parts.Add(range.from.ToString());
} else {
parts.Add(range.@from + "-" + range.to);
}
}
return string.Join(", ", parts);
}
After renaming your function to Ranges
, these tests ran successfully. So I was ready to refactor your code. I did not really do this since your code looked too complicated to start with. Instead, I remembered that I had successfully used the following pattern quite often:
var start = ...;
while (start < nums.Count) {
var end = ...;
while (end < nums.Count) {
}
}
With this knowledge I wrote the following code:
public static List<(int from, int to)> Ranges(List<int> nums)
{
nums = nums.OrderBy(a => a).Distinct().ToList();
var ranges = new List<(int from, int to)>();
var start = 0;
while (start < nums.Count) {
var end = start + 1; // the range is from [start, end).
while (end < nums.Count && nums[end - 1] + 1 == nums[end]) {
end++;
}
ranges.Add((nums[start], nums[end - 1]));
start = end; // continue after the current range
}
return ranges;
}
This code doesn't need any special cases for the last range, or anything else. A range either stops when the end of the numbers is reached, or when the following number is not consecutive. This sounds sensible, and this is exactly what the code is doing.
I removed the check for nums == null
since it is not necessary. Collections should never be null, and if they are, the code immediately throws an exception, which is fine.
I also removed the special case for nums.Count == 0
since returning an empty list is much better than returning null. Again, expressions that have collection type should never be null. The test cases cover this case, so there's nothing to worry about.
Your code takes a list, ensures it's sorted and distinct, then assembles and returns another list. This approach is problematic when the input is in fact a generator, in which case your code forces the entire list to memory first. Also, the data might already be sorted or the caller might not be interested in the entire result.
public static List<(int from, int to)> get_consecutive_ranges(List<int> fids)
{
if (fids == null || fids.Count == 0) return null;
fids = fids.OrderBy(a => a).Distinct().ToList();
A much more idiomatic approach is to build an extension method that works on IEnumerable
and yield
s its results. The call to Distinct().OrderBy(a => a)
should be left to the caller. Also note that Distinct
doesn't guarantee any particular order, so the sorting must always happen afterwards. The actual logic is overly complicated in your code and be greatly simplified as well. Here is how I would do it:
public static IEnumerable<(int begin, int end)> Ranges(this IEnumerable<int> nums)
{
var e = nums.GetEnumerator();
if (e.MoveNext())
{
var begin = e.Current;
var end = begin + 1;
while (e.MoveNext())
{
if (e.Current != end)
{
yield return (begin, end);
begin = end = e.Current;
}
end++;
}
yield return (begin, end);
}
}
Add a simple helper function for pretty printing, which should also be an extension method:
public static string Show(this IEnumerable<(int begin, int end)> ranges)
{
return "[" + string.Join(",", ranges.Select(r => r.end - r.begin == 1 ? $"{r.begin}" : $"{r.begin}-{r.end-1}")) + "]";
}
Example usage:
Console.WriteLine(new int[] { }.Ranges().Show()); -> "[]"
Console.WriteLine(new int[] { 1 }.Ranges().Show()); -> "[1]"
Console.WriteLine(new int[] { 1, 2, 3, 4, 5 }.Ranges().Show()); -> "[1-5]"
Console.WriteLine(new int[] { 1, 2, 3, 5 }.Ranges().Show()); -> "[1-3,5]"
Console.WriteLine(new int[] { 1, 3, 5, 6 }.Ranges().Show()); -> "[1,3,5-6]"
Console.WriteLine(new int[] { 1, 3, 3, 3, 5, 6 }.Distinct().Ranges().Show()); -> "[1,3,5-6]"
Console.WriteLine(new int[] { 6, 3, 5, 1 }.OrderBy(i => i).Ranges().Show()); -> "[1,3,5-6]"
-
\$\begingroup\$ A much more idiomatic approach is to build an extension method the works on IEnumerable and yields its results. - This is exactly how it should be done ;-] \$\endgroup\$t3chb0t– t3chb0t2019年04月27日 08:40:24 +00:00Commented Apr 27, 2019 at 8:40
As a pre-note, I had to translate everything to camelCase and PascalCase because I was having a very difficult time reading your code. No judgments, just understand I was running short on time and didn't translate it back on the conclusion.
Readability
Use meaningful names.
fids
I changed tointegers
as from your description I gather it's the list of integers used to create the ranges.fids_index
seems very noisy. Standard convention isi
,j
, etc. When you reduce the iterator variable down to a single character it's much easier to focus on what actually matters.- Booleans should be prefixed with
Is
orHas
orWas
etc. So in this caseIsFirst
andIsLast
instead offirst
andlast
makes it easier to read as english. You could even consider usingvar isNotFirst = i != 0;
as you only use!first
in the algorithm. - Last vs Previous.
is_conseq_with_last
is refering to previous, so I switched it to previous to avoid confusion withlast
. start_index
andend_index
sound like they would be0
andintegers.Count
. They're the start and end of the range you're tracking, let's name them accordingly:rangeStart
andrangeEnd
.
If-Statement Ordering & Flow
We've all been there, written an algorithm, tested it, and it works, all done right? Well, sometimes that algorithm can be restructured to express what we're actually doing much clearer.
if (!first && fids[fids_index - 1] == fids[fids_index] - 1)
{
is_conseq_with_last = true;
}
else if (!first)
{
is_conseq_with_last = false;
}
There's a common denominator here, !first
let's work with that, clearly neither of these happen if it's true.
if (!first)
{
if (fids[fids_index - 1] == fids[fids_index] - 1)
{
is_conseq_with_last = true;
}
else
{
is_conseq_with_last = false;
}
}
Well, now it's obvious that inner if-statement can be reduced.
if (!first)
{
is_conseq_with_last = fids[fids_index - 1] == fids[fids_index] - 1;
}
And if you wanted, a ternary operator.
is_conseq_with_last = !first && fids[fids_index - 1] == fids[fids_index] - 1;
And it's now also obvious that you're setting the variable with each iteration, so there's no need to declare it outside the loop.
Similar can be done with the bottom if-statement. Here's what I ended up with altogether.
My Version
I am running short on time so there might be a mistake or two and it's still in my coding style so don't take it for face value and try to apply the concepts I'm showing above yourself.
public static List<(int from, int to)> GetConsecutiveRanges(List<int> integers)
{
if (integers == null || integers.Count == 0) { return null; }
integers = integers.OrderBy(a => a).Distinct().ToList();
var ranges = new List<(int from, int to)>();
var rangeStart = 0;
var rangeEnd = 0;
for (var i = 0; i < integers.Count; i++)
{
var isFirst = i == 0;
var isLast = i == integers.Count - 1;
var isConsecutiveFromPrevious = isFirst == false && integers[i-1] == integers[i] - 1;
if (last)
{
if (isConsecutiveFromPrevious == false)
{
rangeEnd = isFirst == false ? i - 1 : rangeEnd;
ranges.Add((integers[rangeStart], integers[rangeEnd]));
rangeStart = i;
}
rangeEnd = i;
ranges.Add((integers[rangeStart], integers[rangeEnd]));
}
else if (isConsecutiveFromPrevious == false && isFirst == false)
{
rangeEnd = isFirst == false && isLast == false ? i - 1 : i;
ranges.Add((integers[rangeStart], integers[rangeEnd]));
rangeStart = i;
}
}
ranges.ForEach(a => Console.WriteLine($"(x >= {a.@from} && x <= {a.to})"));
return ranges;
}
-
\$\begingroup\$ I wonder whether it would be more efficient to swap these two calls
.OrderBy(a => a).Distinct().
- just thinking aloud... \$\endgroup\$t3chb0t– t3chb0t2019年04月26日 21:28:19 +00:00Commented Apr 26, 2019 at 21:28 -
\$\begingroup\$ @t3chb0t I had exactly the same thought, but in the end, both methods must track values, so performance must surely be similar either way around? \$\endgroup\$Aalawlx– Aalawlx2019年04月26日 21:36:43 +00:00Commented Apr 26, 2019 at 21:36
-
\$\begingroup\$ @Shelby115 There are some very interesting points in your answer - thanks a lot. \$\endgroup\$Aalawlx– Aalawlx2019年04月26日 21:37:44 +00:00Commented Apr 26, 2019 at 21:37
-
\$\begingroup\$ I had to translate everything to camelCase and PascalCase because I was having a very difficult time reading your code - same here ;-) and I think we'll need this for other questions too:
Regex.Replace(File.ReadAllText(path), "_([a-zA-Z])", m => m.Groups[1].Value.ToUpper());
\$\endgroup\$t3chb0t– t3chb0t2019年04月27日 08:31:08 +00:00Commented Apr 27, 2019 at 8:31 -
1\$\begingroup\$ @t3chb0t Would be cool if the site implemented something like a toggle button to view code in a different convention. Obviously there are a lot of complications to being able to do that, so it wouldn't quite work in practice necessarily. \$\endgroup\$Shelby115– Shelby1152019年04月29日 13:08:22 +00:00Commented Apr 29, 2019 at 13:08
fids
stand for? Would help me understand what your algorithm is doing better ( at least I would hope ). \$\endgroup\$