How to make this algorithm faster. Calculates and searches through large arrays

Question 1

I've got this algorithm that's "complex". The comments in code give examples of how large the various data types could be. My CPU usage is less than 10% when running this and RAM usage is good. No leakage or anything.

I have a list of arrays, where each array is x coordinates. We are storing a list of several x coordinate-groups essentially. = xs in code.
And I have the same thing but for y-values = ys.

Each array in xs and ys are different sizes. HOWEVER, the size of xs and ys is always the same. So if an array in xs contains 321654 elements then there is a corresponding array in ys with exactly 321654 elements.

The corresponding elements or "paired" xs and ys arrays are always at the same index in their corresponding list. So if xs_array[321654] is in xs[4] then ys_array[321654] is at ys[4].

The following code aims to get mean values, standard deviation and -1std and +1std from mean, as y coordinates from a collection of coordinates. It does this by taking the smallest arrays (the smallest set of x and y coordinates. It then looks at each array in xs and finds the index at which the x-coordinates are in the array. It then goes into ys, finds the corresponding xs array, and gets the y value from the x-coordinate. It goes through this, adds it all up. calculate, mean, std etc etc.

 List<Double[]> xs; //each array may be be e.g 40000 elements. and the list contains 50 - 100 of those
 List<Double[]> ys; //a list containing arrays with an exact equal size as xs
public void main_algorithm()
{
 
 int TheSmallestArray = GetSmallestArray(xs); //get the smallest array out of xs and store the size in TheSmallestArray
 for (int i = 0; i < TheSmallestArray; i++) 
 {
 double The_X_at_element = The_Smallest_X_in_xs[i]; //store the value at the index i
 //go through each array find the element at which x_values is at. If it doesnt exist, find the closest element. 
 List<Double> elements = new List<double>(); //create a new list of doubles
 for (int o = 0; o<xs.Count(); o++) //go through each array in xs
 {
 //go through the array o and find the index at which the number or closest number of the_X_at_element is
 int nearestIndex = Array.IndexOf(xs[o], xs[o].OrderBy(number => Math.Abs(number - The_X_at_element)).First()); 
 double The_Y_at_index = ys[o][nearestIndex]; //go through ys and get the value at this index
 elements.Add(The_Y_at_index); store the value in elements
 }
 mean_values.Add(elements.Mean()); //get the mean of all the values from ys taken
 standard_diviation_values.Add(elements.PopulationStandardDeviation());//get the mean of all the values from ys taken
 Std_MIN.Add(mean_values[i] - standard_diviation_values[i]); //store the mean - std and add to min
 Std_MAX.Add(mean_values[i] + standard_diviation_values[i]); //store the mean + std and add to max
 }
}
 public int GetSmallestArray(List<double[]> arrays)
{
 int TheSmallestArray = int.MaxValue;
 foreach(double[] ds in arrays)
 {
 if(ds.Length < TheSmallestArray)
 {
 TheSmallestArray = ds.Length; //store the length as TheSmallestArray
 The_Smallest_X_in_xs = ds;//and the entirety of the array ds as TheSmallestArray_xs
 }
 }
 return TheSmallestArray;
}

Question 2

Can the arrays be sorted prior to calling this code? Being able to do a binary search to find nearestIndex would improve performance

Question 3

There is a lot of context missing here: What significance is the shortest xs array? Does the order of elements in mean_values, Std_MIN/MAX` & standard_diviation_values (sic) matter (beyond values at the same index describing the same thing - quite redundantly), what are they used for?

Question 4

The current question title, which states your concerns about the code, is too general to be useful here. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How to get the best value out of Code Review: Asking Questions for guidance on writing good question titles.

Question 5

The question does not contain an explicit specification of what the code is to achieve.
The code presented for review suffers from that very problem - How can anyone tell what deviations are allowable?

variable names are an odd mix of PascalCase and snake_case

Tactical:

if GetSmallestArray() returned an array (the name suggests this), the length could be (re-) established without an enclosing-scope variable
use an ordered clone shortestX of the smallest array(xs) and two arrays noGreater and noSmaller of same size
for each array of x values
initialise noGreater and noSmaller suitably
find each value in shortestX (or the ones just below and/or above)
and update noGreater and noSmaller accordingly
for each value in shortestX, take the closer one from noGreater
and noSmaller, find and use its index in the current array

Strategic:

CPU usage is less than 10% Guess: the computer showing this has many cores and uses few.
If above change in algorithm does not improve run time to good enough,

look for further improvements in data presentation and processing
only then give concurrency more than a side-glance

greybeard greybeard 7,3913 gold badges21 silver badges55 bronze badges · Answer 1 · 2021-12-04 20:03:52Z

The question does not contain an explicit specification of what the code is to achieve.
The code presented for review suffers from that very problem - How can anyone tell what deviations are allowable?

variable names are an odd mix of PascalCase and snake_case

Tactical:

if GetSmallestArray() returned an array (the name suggests this), the length could be (re-) established without an enclosing-scope variable
use an ordered clone shortestX of the smallest array(xs) and two arrays noGreater and noSmaller of same size
for each array of x values
initialise noGreater and noSmaller suitably
find each value in shortestX (or the ones just below and/or above)
and update noGreater and noSmaller accordingly
for each value in shortestX, take the closer one from noGreater
and noSmaller, find and use its index in the current array

Strategic:

CPU usage is less than 10% Guess: the computer showing this has many cores and uses few.
If above change in algorithm does not improve run time to good enough,

look for further improvements in data presentation and processing
only then give concurrency more than a side-glance

Stack Exchange Network

How to make this algorithm faster. Calculates and searches through large arrays

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to make this algorithm faster. Calculates and searches through large arrays

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions