I'm using C# in .NET 4. The class holds an array of double
that represents a 2-D rectangular matrix but (due to business requirements) must be stored flattened (i.e. a double[]
). The class also stores the dimensions of the array. I wish to sort the array by the (virtual) first column. Obviously I have to unflatten it, at least partially, in order to do this, right? I don't see anything in LINQ or System.Array
that works on a flattened array.
If there's 1 column, System.Array.Sort()
works fine and I don't think it can be improved.
If there's 2 columns, it's not hard to split it into two arrays and again use System.Array.Sort()
. If you quickly see a way to improve it great, but if not, I'm happy with it as it stands.
I'm not happy with the code for 3 or more columns (it stinks) and suspect there's some way to improve it.
N.B. In real life the class has get-only properties. It's guaranteed that rows
and cols
are positive, array != null
, and rows*cols==array.Length
.
public partial class FlattenedArray
{
public int rows;
public int cols;
public double[] array;
public void sort()
{
double[] keys;
switch(cols)
{
case 1:
System.Array.Sort<double>(array);
break;
case 2:
keys = new double[rows];
double[] dItems = new double[rows];
for (int i=0,k=0; i<rows; i++)
{
keys[i] = array[k++];
dItems[i] = array[k++];
}
System.Array.Sort<double,double>(keys,dItems);
for (int i=0,k=0; i<rows; i++)
{
array[k++] = keys[i];
array[k++] = dItems[i];
}
break;
default:
keys = new double[rows];
int[] items = new int[rows];
double[] temp = new double[rows*cols];
for (int i=0,k=0; i<rows; i++, k+=cols)
{
keys[i] = array[k];
items[i] = k;
}
System.Array.Sort<double,int>(keys,items);
for (int i=0,k=0; i<rows; i++)
{
int spt = items[i];
for (int j = 0; j < cols; j++)
{
temp[k++] = array[spt++];
}
}
array = temp;
break;
}
3 Answers 3
You can improve the code for 3 or more columns and at the same time use it for 1 or 2 columns, by using LINQ:
public partial class FlattenedArray
{
public int rows;
public int cols;
public double[] array;
public void sort()
{
array = (from i in Enumerable.Range(0, rows)
select new List<double>(array.ToList().GetRange(cols * i, cols)))
.OrderBy(x => x[0])
.SelectMany(x => x)
.ToArray();
}
}
This divides the array into dimensions and sorts it by the first column then flattens it all back to 1D. It will work for any combination of rows and columns as long the conditions you mentioned hold true.
If the data size gets very large you still might want to use Array.Sort for the 1 column data as this code wouldn't be very efficient for 1 column.
-
1\$\begingroup\$ any particular reason you mixed syntaxes (some keywords and some methods)? \$\endgroup\$Snowbody– Snowbody2014年05月21日 13:49:27 +00:00Commented May 21, 2014 at 13:49
-
2\$\begingroup\$ Why are you converting the array to list over and over? \$\endgroup\$svick– svick2014年05月21日 16:29:16 +00:00Commented May 21, 2014 at 16:29
-
\$\begingroup\$ I tried to fit it into the existing code, to concentrate on the solution. Lists have getrange method the arrays don't. I tried to keep it simple. If optimization is needed it's a very easy fix to declare it outside the LINQ statement. \$\endgroup\$user33306– user333062014年05月22日 02:32:34 +00:00Commented May 22, 2014 at 2:32
As for introducing special cases for efficiency purposes, you ain't gonna need it. Don't try to second guess reality. If you hit performance bottle neck, AND determine that a great majority of the cases can be handled specially, then you make the special hack. Then measure again keep the hack if it actually is useful.
More than a few lines in a case
statement reduces readability. So extract those statements to a method.
default:
array = Sort(array, rows, cols);
break;
After that let's look at he algorithm closely to how we can make it more readable
keys = new double[rows];
int[] items = new int[rows];
double[] temp = new double[rows*cols];
for (int i=0,k=0; i<rows; i++, k+=cols)
{
keys[i] = array[k];
items[i] = k;
}
A loop doing more than one thing is confusing so split it up, and rearrange declarations so that they are close to what they are used, also rename single letter variables (and generic variable names copied from the API) according to their purpose:
static double[] Sort(double[] array, int rows, int cols)
{
var result = new double[rows * cols];
var rowStartIndices = new int[rows];
for (int row = 0; row < rows; row++)
rowStartIndices[row] = row * cols;
var rowFirstValues = new double[rows];
for (int row = 0; row < rows; row++)
rowFirstValues[row] = array[rowStartIndices[row]];
Array.Sort(rowFirstValues, rowStartIndices);
for (int row = 0; row < rows; row++)
Array.Copy(array, rowStartIndices[row], result, row * cols, cols);
return result;
}
Use Array.Copy
to copy array ranges.
Also some unnecessary type declarations are removed to improve readability. And it reads quite well top to bottom. There is some more noise that can be removed, and some other noise that can be moved elsewhere:
static double[] Sort(double[] array, int rows, int cols)
{
double[] sorted = new double[array.Length];
Enumerable.Range(0, rows)
.Select(row => row * cols)
.OrderBy(rowStartIndex => array[rowStartIndex])
.ForEach((rowStartIndex, row) => Array.Copy(array, rowStartIndex, sorted, row * cols, cols));
return sorted;
}
Where the ForEach
method is a convenience method defined as such:
static class LocalExtensions
{
public static void ForEach<T>(this IEnumerable<T> source, Action<T, int> action)
{
int index = 0;
foreach(var current in source)
{
action(current, index);
index++;
}
}
}
-
\$\begingroup\$ Thanks, the special cases weren't for efficiency purposes, they were for readability, because the 3-or-more case was so hard to understand. \$\endgroup\$Snowbody– Snowbody2014年05月21日 13:36:11 +00:00Commented May 21, 2014 at 13:36
-
\$\begingroup\$ I know that the compiler/optimizer is easily capable of determining that the two loops can be done at once...and that the common subexpression can get put in a register...but old (bad) habits die hard. \$\endgroup\$Snowbody– Snowbody2014年05月21日 13:52:28 +00:00Commented May 21, 2014 at 13:52
-
\$\begingroup\$ I remember hearing that
Array.Copy()
was suboptimal since it wasn't a type-specialized generic. Did that get fixed? \$\endgroup\$Snowbody– Snowbody2014年05月21日 13:53:08 +00:00Commented May 21, 2014 at 13:53 -
\$\begingroup\$ As for
Array.Copy
performance, I didn't do any .NET programming where time performance was critical, so I just don't know. I just assume it is at least as fast as hand coded loops, giving the JIT the chance to issue optimized machine instructions, disable runtime checks, summon elder gods, etc CLR magic too arcane to mere mortals like me :) \$\endgroup\$abuzittin gillifirca– abuzittin gillifirca2014年05月21日 14:43:12 +00:00Commented May 21, 2014 at 14:43 -
\$\begingroup\$ As for two
for
loops performance, it should be negligible, even if not combined by the compiler. In general updating the loop variable on the processor takes very little time compared to moving chunks of data to and from off-processor memory. Most important performance decisions would be not doing the copying twice, and not callingMoveNext/Current
for each element of the array (instead of once for row), in addition to copying. e.g. by doingToList
andToArray
as in the other answer. \$\endgroup\$abuzittin gillifirca– abuzittin gillifirca2014年05月21日 15:12:05 +00:00Commented May 21, 2014 at 15:12
Maybe I'm entirely off-base, but the real smell to me here is this statement:
(due to business requirements) must be stored flattened
If it is truly a rectangular array, all your operations on the flat array have to be double checked to make sure they are valid for the real data structure that you need. It sounds like you may be bending over backwards thinking about the end and not the means. I can certainly understand reasons why it needs to ultimately be converted to a flattened array for storage, etc., but if for all intents and purposes it should behave like a rectangular array in code then it should be one.
If you just use a rectangular array, all the state, dimensions, etc. you have to monitor are already taken care of for you, and if you need it back in the flat format for the "business requirement", there are very easy ways to do the conversion in that direction:
int[,] whatTheCodeProbablyWants = new int[2,2] { {1, 2}, {3, 4} };
int[] whatTheBusinessSideNeeds = whatTheCodeProbablyWants.Cast<int>().ToArray();
...or jagged:
int[][] whatTheCodeProbablyWants = new int[][] { new int[] {1, 2}, new int[] {3} };
int[] whatTheBusinessSideNeeds = whatTheCodeProbablyWants.SelectMany (x => x).ToArray();
-
\$\begingroup\$ At first I was worried that this would cause too much copying, but I'm already doing more copying than that anyway. \$\endgroup\$Snowbody– Snowbody2014年05月21日 20:46:37 +00:00Commented May 21, 2014 at 20:46