i am trying to practice on matlab with the code below: I randomly generated x and y variable to implement the following formula in matlab : $${\sum_{i=1}^{n-ln} \bar{Y}_{i}\left(\bar{X}_{i} - \frac{1}{k_n} \sum_{j=0}^{kn-1} \bar{X}_{(i+j)}\right)}$$ where the quantity \$\bar Y\$ and \$\bar X\$ are obtained according to the following formula: $$\bar{Y}_i = \frac{1}{ln} \left( \sum_{j=ln/2}^{ln-1} Y_{i+j}-\sum_{j=0}^{ln/2-1}Y_{i+j} \right) \quad \bar{X}_i = \frac{1}{ln} \left( \sum_{j=ln/2}^{ln-1} X_{i+j}-\sum_{j=0}^{ln/2-1}X_{i+j} \right) $$ Since the second \$\bar{X}_{i+j}\$ needs \$kn\$ more elements I dropped the last \$kn\$ elements from the length of delta_y to be sure that index does not exceed the length of my variable
The problem is that when I try to run the code it takes forever because of the last line( I think). How can I overcome this issue?
x = randn(20000,1);
y = randn(20000,1);
delta_x = diff(x);
delta_y = diff(y);
time = length(x);
ln = floor(time^(.5));
kn = floor(time^(.5)/2);
dt_detrend = zeros(length(delta_x)-kn,1);
for i = 1:length(delta_x)-kn
dt_detrend(i,1) = delta_x(i+kn) - mean(delta_x(i:i+kn));
end
delta_y(length(dt_detrend)+1:end) = [];
n = length(delta_y);
% Preallocate delta2_bar as a matrix
Y_bar = zeros(n-ln,1);
X_bar = zeros(n-ln,1);
X__bar_detrend = zeros(n-ln,1)
for i = 1:n-ln
idx_start = floor(ln/2);
idx_end = ln - 1;
j_idx = floor(ln/2) - 1;
k_idx = kn-1;
Y_bar(i) = (1/ln) * (sum(delta_y(i+idx_start : i + idx_end)) - sum(delta_y(i: i+j_idx)));
X_bar(i) = (1/ln) * (sum(delta_x(i+idx_start: i+idx_end)) - sum(delta_x(i+0 : i+ j_idx)));
X2_bar = zeros(kn, 1);
for j = 0:k_idx
X2_bar(j+1) = (1/ln) * (sum(delta_x(i+idx_start +j : i + j + idx_end)) - sum(delta_x(i+j : i+j_idx+j)));
end
X__bar_detrend(i) = X_bar(i)-mean(X2_bar)
end
1 Answer 1
First, make sure every line ends with a semicolon. Otherwise, the result will be printed in the command area. Printing 20,000 zeros can take a long time, especially in a loop. Specifically, these two lines:
X__bar_detrend = zeros(n-ln,1)
% ...
X__bar_detrend(i) = X_bar(i)-mean(X2_bar) % this one is in a loop, which would kill performance
Putting in semicolons allows the script to finish in 17 seconds. Before this, I stopped the script after 6 minutes.
Second, avoid for
loops. MATLAB for
loops are famous for being slow. However, any algorithm that needs a loop probably as a built-in function to do it for you and more efficiently. Take these lines:
dt_detrend = zeros(length(delta_x)-kn,1);
for i = 1:length(delta_x)-kn
dt_detrend(i,1) = delta_x(i+kn) - mean(delta_x(i:i+kn));
end
The body of this loop is calculating a mean on a slice of the delta_x
vector that moves over the whole vector. The correct way to write this section is with the movmean()
function.
dt_detrend2 = delta_x(kn+1:end) - movmean(delta_x, kn + 1, "Endpoints", "discard");
No loops, no pre-allocation. Just this one change brings the running time down from 17 seconds to 16 seconds (a small change in absolute time, but it is a 6% speedup from one loop). The "Endpoints", "discard"
arguments tell MATLAB to not include averages when the window slides off the end of the vector.
What follows is a nested for-loop, which definitely has opportunities for speeding up--either by using built-in functions or rewriting the code to not use loops. I won't try to rewrite it since I don't know what the code is doing, so I won't know if I change the answer.
Always look for ways to do what you want without loops or indexing. Try to find functions and expressions that get the result you want rather than building up a result matrix element-by-element. Any time you start working with indexes, you should stop and think about how you might write the code differently. In short, this is the wrong way to add one matrix to the transpose of another:
A = rand(1000, 1000);
B = rand(1000, 1000);
C = zeros(1000, 1000);
for i = 1:1000
for j = 1:1000
C(i, j) = A(i, j) + B(j, i);
end
end
and this is the right way.
A = rand(1000, 1000);
B = rand(1000, 1000);
C = A + B'; % The apostrophe is the transpose operator.
The wrong way (with indexes) takes 4.4 seconds on my computer. The right way takes 0.01 seconds.
-
\$\begingroup\$ "MATLAB for loops are famous for being slow." Indeed, they were slow 20 years ago. And the fame persists. But nowadays MATLAB has a JIT, making loops very fast. A vectorized operation can use parallelism where a loop won’t, but often the loop is just fine. \$\endgroup\$Cris Luengo– Cris Luengo2024年01月14日 05:40:19 +00:00Commented Jan 14, 2024 at 5:40
X2_bar
when all you need is its mean. Instead, add up the values you compute and divide by the number of values computed, then you don’t need to create that vector. I’m sure there are similar small gains to be had. \$\endgroup\$X2_bar
? The problem is that at each iterationi
I want a differentX2_bar
vector of lengthkn
. \$\endgroup\$X2_bar
I am having a hard time understanding how to get rid of it and still perform the task I need \$\endgroup\$x(i)=func(i)
inside the loop and then computingmean(x)
, dox_sum=0
, inside the loop dox_sum=x_sum+func(i)
, then the mean isx_sum/n
, withn
the number of loop iterations. This is not a huge difference, but you should certainly notice it. There are other improvements to be made, but I’d want to experiment before recommending anything. The moving mean for example can be computed much more efficiently as the difference between two shifted versions ofcumsum
. \$\endgroup\$