Sorting a flattened array

Question 1

I'm using C# in .NET 4. The class holds an array of double that represents a 2-D rectangular matrix but (due to business requirements) must be stored flattened (i.e. a double[]). The class also stores the dimensions of the array. I wish to sort the array by the (virtual) first column. Obviously I have to unflatten it, at least partially, in order to do this, right? I don't see anything in LINQ or System.Array that works on a flattened array.

If there's 1 column, System.Array.Sort() works fine and I don't think it can be improved.

If there's 2 columns, it's not hard to split it into two arrays and again use System.Array.Sort(). If you quickly see a way to improve it great, but if not, I'm happy with it as it stands.

I'm not happy with the code for 3 or more columns (it stinks) and suspect there's some way to improve it.

N.B. In real life the class has get-only properties. It's guaranteed that rows and cols are positive, array != null, and rows*cols==array.Length.

public partial class FlattenedArray
{
 public int rows;
 public int cols;
 public double[] array;
 public void sort()
 {
 double[] keys;
 switch(cols)
 {
 case 1:
 System.Array.Sort<double>(array);
 break;
 case 2:
 keys = new double[rows];
 double[] dItems = new double[rows];
 for (int i=0,k=0; i<rows; i++)
 {
 keys[i] = array[k++];
 dItems[i] = array[k++];
 }
 System.Array.Sort<double,double>(keys,dItems);
 for (int i=0,k=0; i<rows; i++)
 {
 array[k++] = keys[i];
 array[k++] = dItems[i];
 }
 break;
 default:
 keys = new double[rows];
 int[] items = new int[rows];
 double[] temp = new double[rows*cols];
 for (int i=0,k=0; i<rows; i++, k+=cols)
 {
 keys[i] = array[k];
 items[i] = k;
 }
 System.Array.Sort<double,int>(keys,items);
 for (int i=0,k=0; i<rows; i++)
 {
 int spt = items[i];
 for (int j = 0; j < cols; j++)
 {
 temp[k++] = array[spt++];
 }
 }
 array = temp;
 break;
 }

Question 2

You can improve the code for 3 or more columns and at the same time use it for 1 or 2 columns, by using LINQ:

public partial class FlattenedArray
{
 public int rows;
 public int cols;
 public double[] array;
 public void sort()
 {
 array = (from i in Enumerable.Range(0, rows)
 select new List<double>(array.ToList().GetRange(cols * i, cols)))
 .OrderBy(x => x[0])
 .SelectMany(x => x)
 .ToArray(); 
 }
}

This divides the array into dimensions and sorts it by the first column then flattens it all back to 1D. It will work for any combination of rows and columns as long the conditions you mentioned hold true.

If the data size gets very large you still might want to use Array.Sort for the 1 column data as this code wouldn't be very efficient for 1 column.

Question 3

any particular reason you mixed syntaxes (some keywords and some methods)?

Question 4

Why are you converting the array to list over and over?

Question 5

I tried to fit it into the existing code, to concentrate on the solution. Lists have getrange method the arrays don't. I tried to keep it simple. If optimization is needed it's a very easy fix to declare it outside the LINQ statement.

Question 6

As for introducing special cases for efficiency purposes, you ain't gonna need it. Don't try to second guess reality. If you hit performance bottle neck, AND determine that a great majority of the cases can be handled specially, then you make the special hack. Then measure again keep the hack if it actually is useful.

More than a few lines in a case statement reduces readability. So extract those statements to a method.

default:
 array = Sort(array, rows, cols);
 break;

After that let's look at he algorithm closely to how we can make it more readable

keys = new double[rows];
int[] items = new int[rows];
double[] temp = new double[rows*cols];
for (int i=0,k=0; i<rows; i++, k+=cols)
{
 keys[i] = array[k];
 items[i] = k;
}

A loop doing more than one thing is confusing so split it up, and rearrange declarations so that they are close to what they are used, also rename single letter variables (and generic variable names copied from the API) according to their purpose:

static double[] Sort(double[] array, int rows, int cols)
{
 var result = new double[rows * cols];
 var rowStartIndices = new int[rows];
 for (int row = 0; row < rows; row++)
 rowStartIndices[row] = row * cols;
 var rowFirstValues = new double[rows];
 for (int row = 0; row < rows; row++)
 rowFirstValues[row] = array[rowStartIndices[row]];
 Array.Sort(rowFirstValues, rowStartIndices);
 for (int row = 0; row < rows; row++)
 Array.Copy(array, rowStartIndices[row], result, row * cols, cols);
 return result;
}

Use Array.Copy to copy array ranges.

Also some unnecessary type declarations are removed to improve readability. And it reads quite well top to bottom. There is some more noise that can be removed, and some other noise that can be moved elsewhere:

static double[] Sort(double[] array, int rows, int cols)
{
 double[] sorted = new double[array.Length];
 Enumerable.Range(0, rows)
 .Select(row => row * cols)
 .OrderBy(rowStartIndex => array[rowStartIndex])
 .ForEach((rowStartIndex, row) => Array.Copy(array, rowStartIndex, sorted, row * cols, cols));
 return sorted;
}

Where the ForEach method is a convenience method defined as such:

static class LocalExtensions
{
 public static void ForEach<T>(this IEnumerable<T> source, Action<T, int> action)
 {
 int index = 0;
 foreach(var current in source)
 {
 action(current, index);
 index++;
 }
 }
}

Question 7

Thanks, the special cases weren't for efficiency purposes, they were for readability, because the 3-or-more case was so hard to understand.

Question 8

I know that the compiler/optimizer is easily capable of determining that the two loops can be done at once...and that the common subexpression can get put in a register...but old (bad) habits die hard.

Question 9

I remember hearing that Array.Copy() was suboptimal since it wasn't a type-specialized generic. Did that get fixed?

Question 10

As for Array.Copy performance, I didn't do any .NET programming where time performance was critical, so I just don't know. I just assume it is at least as fast as hand coded loops, giving the JIT the chance to issue optimized machine instructions, disable runtime checks, summon elder gods, etc CLR magic too arcane to mere mortals like me :)

Question 11

As for two for loops performance, it should be negligible, even if not combined by the compiler. In general updating the loop variable on the processor takes very little time compared to moving chunks of data to and from off-processor memory. Most important performance decisions would be not doing the copying twice, and not calling MoveNext/Current for each element of the array (instead of once for row), in addition to copying. e.g. by doing ToList and ToArray as in the other answer.

Question 12

Maybe I'm entirely off-base, but the real smell to me here is this statement:

(due to business requirements) must be stored flattened

If it is truly a rectangular array, all your operations on the flat array have to be double checked to make sure they are valid for the real data structure that you need. It sounds like you may be bending over backwards thinking about the end and not the means. I can certainly understand reasons why it needs to ultimately be converted to a flattened array for storage, etc., but if for all intents and purposes it should behave like a rectangular array in code then it should be one.

If you just use a rectangular array, all the state, dimensions, etc. you have to monitor are already taken care of for you, and if you need it back in the flat format for the "business requirement", there are very easy ways to do the conversion in that direction:

int[,] whatTheCodeProbablyWants = new int[2,2] { {1, 2}, {3, 4} };
int[] whatTheBusinessSideNeeds = whatTheCodeProbablyWants.Cast<int>().ToArray();

...or jagged:

int[][] whatTheCodeProbablyWants = new int[][] { new int[] {1, 2}, new int[] {3} };
int[] whatTheBusinessSideNeeds = whatTheCodeProbablyWants.SelectMany (x => x).ToArray();

Question 13

At first I was worried that this would cause too much copying, but I'm already doing more copying than that anyway.

user33306 · Accepted Answer · 2014-05-21 06:05:39Z

You can improve the code for 3 or more columns and at the same time use it for 1 or 2 columns, by using LINQ:

public partial class FlattenedArray
{
 public int rows;
 public int cols;
 public double[] array;
 public void sort()
 {
 array = (from i in Enumerable.Range(0, rows)
 select new List<double>(array.ToList().GetRange(cols * i, cols)))
 .OrderBy(x => x[0])
 .SelectMany(x => x)
 .ToArray(); 
 }
}

This divides the array into dimensions and sorts it by the first column then flattens it all back to 1D. It will work for any combination of rows and columns as long the conditions you mentioned hold true.

If the data size gets very large you still might want to use Array.Sort for the 1 column data as this code wouldn't be very efficient for 1 column.

any particular reason you mixed syntaxes (some keywords and some methods)?
I tried to fit it into the existing code, to concentrate on the solution. Lists have getrange method the arrays don't. I tried to keep it simple. If optimization is needed it's a very easy fix to declare it outside the LINQ statement.

Stack Exchange Network

Sorting a flattened array

3 Answers 3

You must log in to answer this question.

Hot Network Questions

Sorting a flattened array

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions