I've got this algorithm that's "complex". The comments in code give examples of how large the various data types could be. My CPU usage is less than 10% when running this and RAM usage is good. No leakage or anything.
I have a list of arrays, where each array is x coordinates. We are storing a list of several x coordinate-groups essentially. = xs
in code.
And I have the same thing but for y-values = ys
.
Each array in xs
and ys
are different sizes. HOWEVER, the size of xs
and ys
is always the same. So if an array in xs contains 321654 elements then there is a corresponding array in ys with exactly 321654 elements.
The corresponding elements or "paired" xs
and ys
arrays are always at the same index in their corresponding list. So if xs_array[321654] is in xs[4]
then ys_array[321654] is at ys[4]
.
The following code aims to get mean values, standard deviation and -1std and +1std from mean, as y coordinates from a collection of coordinates. It does this by taking the smallest arrays (the smallest set of x and y coordinates. It then looks at each array in xs
and finds the index at which the x-coordinates are in the array. It then goes into ys
, finds the corresponding xs
array, and gets the y value from the x-coordinate. It goes through this, adds it all up. calculate, mean, std etc etc.
List<Double[]> xs; //each array may be be e.g 40000 elements. and the list contains 50 - 100 of those
List<Double[]> ys; //a list containing arrays with an exact equal size as xs
public void main_algorithm()
{
int TheSmallestArray = GetSmallestArray(xs); //get the smallest array out of xs and store the size in TheSmallestArray
for (int i = 0; i < TheSmallestArray; i++)
{
double The_X_at_element = The_Smallest_X_in_xs[i]; //store the value at the index i
//go through each array find the element at which x_values is at. If it doesnt exist, find the closest element.
List<Double> elements = new List<double>(); //create a new list of doubles
for (int o = 0; o<xs.Count(); o++) //go through each array in xs
{
//go through the array o and find the index at which the number or closest number of the_X_at_element is
int nearestIndex = Array.IndexOf(xs[o], xs[o].OrderBy(number => Math.Abs(number - The_X_at_element)).First());
double The_Y_at_index = ys[o][nearestIndex]; //go through ys and get the value at this index
elements.Add(The_Y_at_index); store the value in elements
}
mean_values.Add(elements.Mean()); //get the mean of all the values from ys taken
standard_diviation_values.Add(elements.PopulationStandardDeviation());//get the mean of all the values from ys taken
Std_MIN.Add(mean_values[i] - standard_diviation_values[i]); //store the mean - std and add to min
Std_MAX.Add(mean_values[i] + standard_diviation_values[i]); //store the mean + std and add to max
}
}
public int GetSmallestArray(List<double[]> arrays)
{
int TheSmallestArray = int.MaxValue;
foreach(double[] ds in arrays)
{
if(ds.Length < TheSmallestArray)
{
TheSmallestArray = ds.Length; //store the length as TheSmallestArray
The_Smallest_X_in_xs = ds;//and the entirety of the array ds as TheSmallestArray_xs
}
}
return TheSmallestArray;
}
1 Answer 1
The question does not contain an explicit specification of what the code is to achieve.
The code presented for review suffers from that very problem - How can anyone tell what deviations are allowable?
- variable names are an odd mix of PascalCase and snake_case
Tactical:
- if
GetSmallestArray()
returned an array (the name suggests this), the length could be (re-) established without an enclosing-scope variable - use an ordered clone
shortestX
of the smallest array(xs
) and two arraysnoGreater
andnoSmaller
of same size - for each array of x values
initialisenoGreater
andnoSmaller
suitably
find each value inshortestX
(or the ones just below and/or above)
and updatenoGreater
andnoSmaller
accordingly
for each value inshortestX
, take the closer one fromnoGreater
andnoSmaller
, find and use its index in the current array
Strategic:
CPU usage is less than 10%
Guess: the computer showing this has many cores and uses few.
If above change in algorithm does not improve run time to good enough,
- look for further improvements in data presentation and processing
- only then give concurrency more than a side-glance
Explore related questions
See similar questions with these tags.
nearestIndex
would improve performance \$\endgroup\$mean_values
, Std_MIN/
MAX` &standard_diviation_values
(sic) matter (beyond values at the same index describing the same thing - quite redundantly), what are they used for? \$\endgroup\$