calculation code to prevent FIFO overflow during data streaming

Question 1

I am using a dual channels DAQ card with data stream mode. I wrote some code for analysis/calculation and put them to the main code for operation. However, the FIFO overflow warning sign always occur once its total data reach around 4300 MS (the DAQ on board memory is 8GB). I am well-known that a complicated calculation might retard the system and cause the overflow but all of the works I wrote are necessary to my experiment which means cannot be replaced (or there is more effective code can let me get the same result). Could anyone point out if there are any ways that I can optimize my code? My computer has 64GB RAM and Intel Core i7 processor. I always turn off other unnecessary software when running the data stream code.

This is how I process the data:

Install the FFTW source code for the Hilbert transform.
Allocate buffer size for channel1 and channel2.
Use for loop to get ch1 and ch2 data from the main buffer.
Do the hilbert on ch2 and calculate its absolute number.
Find the max value of ch1 and abs(hilbert(ch2)).
Calculate max(abs(hilbert(ch2))) / max(ch1). Here is the detail of the my calculation code:

float SumBufferData(void* pBuffer, uInt32 u32Size, uInt32 u32SampleBits)
{
 // In this routine we sum up all the samples in the buffer. This function 
 // should be replaced with the user's analysys function
 uInt32 i,n;
 uInt8* pu8Buffer = NULL;
 int16* pi16Buffer = NULL;
 int64 i64Sum = 0;
 float max1 = 0.0;
 float max2 = 0.0;
 float min2 = 0.0;
 float Corrected =0.0;
 float AUC = 0.0;
 float* ch1Buffer = NULL;
 double* ch2Buffer = NULL;
 if ( 8 == u32SampleBits )
 {
 pu8Buffer = (uInt8 *)pBuffer;
 for (i = 0; i < u32Size; i++)
 {
 i64Sum += pu8Buffer[i];
 }
 }
 else
 {
 pi16Buffer = (int16 *)pBuffer;
 
 
 
 fftw_complex(hilbertedch2[N]);
 ch1Buffer = (float*)calloc(u32Size, sizeof(float));
 ch2Buffer = (double*)calloc(u32Size, sizeof(double));
 // Divide ch1 and ch2 data from pi16Buffer
 for (i = 0; i < u32Size/2; i++)
 {
 ch1Buffer[i] += pi16Buffer[i*2];
 ch2Buffer[i] += pi16Buffer[i * 2 + 1];
 }
 // Here hilbert on the whole ch2
 hilbert(ch2Buffer, hilbertedch2);
 
 //Find max value in each segs of ch1 and hilbertedch2
 for (i = 0; i < u32Size/2; i++)
 {
 if (ch1Buffer[i] > max1)
 max1 = ch1Buffer[i];
 if (abs(hilbertedch2[i][IMAG])> max2)
 max2 = abs(hilbertedch2[i][IMAG]);
 
 }
 
 Corrected = max2 / max1; // Calculate the signal correction
 if (Corrected > 0.1) // check the threshold and calculate area under curve
 AUC += Corrected;
 else
 AUC = 0;
 
 
 
 
 }
 free(ch1Buffer);
 free(ch2Buffer);
 return AUC;
}

And the source code from FFTW:

void hilbert(const double* in, fftw_complex* out)
{
 // copy the data to the complex array
 for (int i = 0; i < N; ++i) {
 out[i][REAL] = in[i];
 out[i][IMAG] = 0;
 }
 // creat a DFT plan and execute it
 fftw_plan plan = fftw_plan_dft_1d(N, out, out, FFTW_FORWARD, FFTW_ESTIMATE);
 fftw_execute(plan);
 // destroy a plan to prevent memory leak
 fftw_destroy_plan(plan);
 int hN = N >> 1; // half of the length (N/2)
 int numRem = hN; // the number of remaining elements
 // multiply the appropriate value by 2
 //(those should multiplied by 1 are left intact because they wouldn't change)
 for (int i = 1; i < hN; ++i) {
 out[i][REAL] *= 2;
 out[i][IMAG] *= 2;
 }
 // if the length is even, the number of the remaining elements decrease by 1
 if (N % 2 == 0)
 numRem--;
 else if (N > 1) {
 out[hN][REAL] *= 2;
 out[hN][IMAG] *= 2;
 }
 // set the remaining value to 0
 // (multiplying by 0 gives 0, so we don't care about the multiplicands)
 memset(&out[hN + 1][REAL], 0, numRem * sizeof(fftw_complex));
 // creat a IDFT plan and execute it
 plan = fftw_plan_dft_1d(N, out, out, FFTW_BACKWARD, FFTW_ESTIMATE);
 fftw_execute(plan);
 // do some cleaning
 fftw_destroy_plan(plan);
 fftw_cleanup();
 // scale the IDFT output
 for (int i = 0; i < N; ++i) {
 out[i][REAL] /= N;
 out[i][IMAG] /= N;
 }
}

Question 2

What type is hilbertedch2?

Question 3

Hi, it is an array with double

Question 4

Don't allocate memory while processing the data

Reserve space for ch1Buffer and ch2Buffer once while your program initializes, so that there is enough to handle any value of u32Size that you might pass to the function (since you are streaming data, I am guessing that this will be a constant anyway, and there's also the fact that the function hilbert() doesn't take a size parameter but uses N instead).

Calculate a plan once and reuse it

Similar to the memory allocation, you should create all the fftw_plans you need once, and then reuse them. You can then also use FFTW_MEASURE to create more optimal plans. If you also need your program to start up fast, consider also using FFTW's wisdom mechanism.

`double` or `float`?

As chux also mentioned, you are mixing float and double variables. Do you need both types? If float gives you enough precision for what you want to do, use the float variants of the FFTW functions to get a large speed boost.

Compile with `-Ofast -march=native`

Use the compiler flags -Ofast -march=native (or whatever the equivalent is for your compiler). This allows the compiler to vectorize your code using SSE instructions: -march=native tells it to use the best vector instructions that your CPU supports, and -Ofast tells it to cut a few corners when it comes to strict compliance to the standard, but typically it allows it to ignore things like denormal floating point numbers and other very unlikely things that might need extra instructions to handle, or that would even prevent vectorization. With these flags, GCC, Clang and ICC will all vectorize the loop that calculates the maximum absolute values.

Note that using fabs() (or fabsf() when using floats) is very important to keep the code efficient. Chux already mentioned that in his answer.

Try splitting the maximum calculation loop in two

You have one loop that scans through two arrays simultaneously. That may or may not be less efficient than splitting this into two loops, one for each array. It is hard to predict which will be more efficient, this depends on how fast the CPU can crunch numbers vs. how much memory bandwidth it can sustain, how good the prefetcher of the CPU is, and how the compilers actually compile the loops. So you need to try out both variants yourself and measure the difference in performance.

Avoid scaling the result of the inverse DFT

Instead of dividing all elements of the array out before returning from hilbert(), don't scale the output, but instead just divide max2 by N after calculating the maximum.

Question 5

Hi, thanks for the reply. In first and the second parts, do you mean I have to set memory allocation and fftw_plan (does the create of IDFT also?)out of the "void"?

Question 6

I'm not sure what you mean by "out of the void". The memory allocations and calls to fftw_plan() should be done outside the functions that are called to process the stream, so outside SumBufferData() and hilbert().

Question 7

I am not really get the last part. Could you specify with code or else, thanks!

Question 8

I tried to move fftw_plan plan = fftw_plan_dft_1d(N, out, out, FFTW_FORWARD, FFTW_ESTIMATE); outside to the hilbert() but buggy, such as plan and out need to redefine. Could you point out how to modify once the fftw_plan plan = fftw_plan_dft_1d(N, out, out, FFTW_FORWARD, FFTW_ESTIMATE); is outside the void?

Question 9

Hi, here are some of my thought after following your guide: 1.Put the memory allocation out of the function didn't make it faster. 2. But move out and reuse the plan did upgrade the speed. 3. Splitting to two for loops didn't show any difference 4. Scaling the IDFT just divide max2 by N also didn't show difference too. But, Thanks!

Question 10

Could anyone point out if there are any ways that I can optimize my code?

In general, there are no better-than-linear improvements here.

*alloc() can be expensive. Since code is calling it twice and then freeing the pointers at the same time, save one expensive *alloc() by doing 1 rather than 2.

With VLA support, it is easy

struct {
 float ch1Buffer[u32Size];
 double ch2Buffer[u32Size];
} *x;
x = calloc(1, sizeof *x);

... or allocate as an array of struct.

struct {
 float ch1Buffer;
 double ch2Buffer;
} *x;
x = calloc(u32Size, sizeof *x);

ch1Buffer{] is not even needed here. Code does not use the array in an array fashion.

Strange Code

Why += instead of =? Assignment just as well here.

 // AUC += Corrected;
 AUC = Corrected;

Integer function used for FP code

OP commented that hilbertedch2 is an array with double

abs() is for integers.

// if (abs(hilbertedch2[i][IMAG])> max2)
// max2 = abs(hilbertedch2[i][IMAG]);
if (fabs(hilbertedch2[i][IMAG])> max2)
 max2 = fabs(hilbertedch2[i][IMAG]);

I'd expect using double for max2.

// float max2 = 0.0;
double max2 = 0.0;

If float Corrected remains float, use float constants.

// if (Corrected > 0.1) 
if (Corrected > 0.1f)

Tip: enable all compiler warnings.

Improve review and maintainability

Cast not needed. Size of the referenced object, rather than trying to match the type. Easier to code right, review and maintain.

// ch1Buffer = (float*)calloc(u32Size, sizeof(float));
ch1Buffer = calloc(u32Size, sizeof *ch1Buffer);

Question 11

Thanks. In the allocation part, the x need to be declared to which kind of variable first? In the AUC part: += means calculate area under curve by summing the data points.

Question 12

Yes. some_type *ptr = malloc(sizeof *ptr * n); works fine.

Question 13

@Kevin AUC += Corrected is only used, at most, once. It is not in a loop. It does not sum points

Question 14

Then which is better to present to sum the data points?

Question 15

@Kevin You need a loop to sum the data points. for, while, do, ...

G. Sliepen G. Sliepen 68.7k3 gold badges74 silver badges179 bronze badges · Accepted Answer · 2020-12-17 21:36:29Z

Don't allocate memory while processing the data

Reserve space for ch1Buffer and ch2Buffer once while your program initializes, so that there is enough to handle any value of u32Size that you might pass to the function (since you are streaming data, I am guessing that this will be a constant anyway, and there's also the fact that the function hilbert() doesn't take a size parameter but uses N instead).

Calculate a plan once and reuse it

Similar to the memory allocation, you should create all the fftw_plans you need once, and then reuse them. You can then also use FFTW_MEASURE to create more optimal plans. If you also need your program to start up fast, consider also using FFTW's wisdom mechanism.

`double` or `float`?

As chux also mentioned, you are mixing float and double variables. Do you need both types? If float gives you enough precision for what you want to do, use the float variants of the FFTW functions to get a large speed boost.

Compile with `-Ofast -march=native`

Use the compiler flags -Ofast -march=native (or whatever the equivalent is for your compiler). This allows the compiler to vectorize your code using SSE instructions: -march=native tells it to use the best vector instructions that your CPU supports, and -Ofast tells it to cut a few corners when it comes to strict compliance to the standard, but typically it allows it to ignore things like denormal floating point numbers and other very unlikely things that might need extra instructions to handle, or that would even prevent vectorization. With these flags, GCC, Clang and ICC will all vectorize the loop that calculates the maximum absolute values.

Note that using fabs() (or fabsf() when using floats) is very important to keep the code efficient. Chux already mentioned that in his answer.

Try splitting the maximum calculation loop in two

You have one loop that scans through two arrays simultaneously. That may or may not be less efficient than splitting this into two loops, one for each array. It is hard to predict which will be more efficient, this depends on how fast the CPU can crunch numbers vs. how much memory bandwidth it can sustain, how good the prefetcher of the CPU is, and how the compilers actually compile the loops. So you need to try out both variants yourself and measure the difference in performance.

Avoid scaling the result of the inverse DFT

Instead of dividing all elements of the array out before returning from hilbert(), don't scale the output, but instead just divide max2 by N after calculating the maximum.

Hi, thanks for the reply. In first and the second parts, do you mean I have to set memory allocation and fftw_plan (does the create of IDFT also?)out of the "void"?
I'm not sure what you mean by "out of the void". The memory allocations and calls to fftw_plan() should be done outside the functions that are called to process the stream, so outside SumBufferData() and hilbert().
I am not really get the last part. Could you specify with code or else, thanks!
I tried to move fftw_plan plan = fftw_plan_dft_1d(N, out, out, FFTW_FORWARD, FFTW_ESTIMATE); outside to the hilbert() but buggy, such as plan and out need to redefine. Could you point out how to modify once the fftw_plan plan = fftw_plan_dft_1d(N, out, out, FFTW_FORWARD, FFTW_ESTIMATE); is outside the void?
Hi, here are some of my thought after following your guide: 1.Put the memory allocation out of the function didn't make it faster. 2. But move out and reuse the plan did upgrade the speed. 3. Splitting to two for loops didn't show any difference 4. Scaling the IDFT just divide max2 by N also didn't show difference too. But, Thanks!

Stack Exchange Network

calculation code to prevent FIFO overflow during data streaming

2 Answers 2

Don't allocate memory while processing the data

Calculate a plan once and reuse it

`double` or `float`?

Compile with `-Ofast -march=native`

Try splitting the maximum calculation loop in two

Avoid scaling the result of the inverse DFT

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

calculation code to prevent FIFO overflow during data streaming

2 Answers 2

Don't allocate memory while processing the data

Calculate a plan once and reuse it

double or float?

Compile with -Ofast -march=native

Try splitting the maximum calculation loop in two

Avoid scaling the result of the inverse DFT

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`double` or `float`?

Compile with `-Ofast -march=native`