I have a function that needs to compute a system response y
in response to an input time series defined by x
. In practice, x
will be determined by an external process. y
is then exposed to external processes as class member timeArray
. Computing y
entails two levels of recursion.
The function I need to optimize in the class below is update()
which has potential to become a process bottleneck at current runtime: ~72 ms on first cycle, ~21ms after a couple dozen iterations. An additional consideration is that in production, this function will be called as a small step in a much larger application, so I'm guessing the cache hits that benefit subsequent runs may have less success when competing with other threads and processes in the JVM.
Pulling the step response calculation requiring a call to exp()
out of the inner loop provided the greatest time savings so far, but I'm running out of ideas to further tune. Are there any optimizations I can make to improve execution speed of update()
?
import static java.lang.Math.exp;
import static java.lang.Math.sin;
public class Looper {
private static final int MAX_PROPAGATION_TIME = 7200;
private static final int SIM_TIME = 100;
public double[] timeArray = new double[MAX_PROPAGATION_TIME];
public static void main(String []args) {
Looper go = new Looper();
go.run();
}
public void run() {
double x[] = new double[MAX_PROPAGATION_TIME];
double tau = 3;
double scale = 1;
double start = 0;
double end = 0;
for (int i = 0; i < SIM_TIME - 1; i++){
for (int j = 0; j < MAX_PROPAGATION_TIME - 1; j++){
x[j] = sin(i) * (1 - exp(-j / 10));
}
start = System.currentTimeMillis();
// update() is the primary function of interest
update(x,tau,scale);
end = System.currentTimeMillis();
System.out.println(String.format("Run %3d: %.0f ms",i,end - start));
//System.out.println(String.format("timeArray[i + 10]: %.2f",this.timeArray[i + 10]));
}
}
public void update(
double[] x,
double tau,
double scale) {
if(tau < 1e-12 / 3) tau = 1e-12 / 3;
double dx;
double y[] = new double[MAX_PROPAGATION_TIME];
// Pre-compute a unit step response for efficiency
double stepResponse[] = new double[MAX_PROPAGATION_TIME];
for (int i = 0; i < MAX_PROPAGATION_TIME - 1; i++){
stepResponse[i] = scale * (1 - exp(-i / tau));
}
// Loop over the full time array and antipated change in x at each time
for (int i = 0; i < MAX_PROPAGATION_TIME - 1; i++){;
dx = x[i+1] - x[i];
// For each element subsequent to time i, evaluate change in y as an
// exponential decay type response and accumulate incremental changes over time
for (int j = i; j < MAX_PROPAGATION_TIME - 1; j++){
y[j] += dx * stepResponse[j - i + 1];
}
}
this.timeArray = y;
}
}
In response to @harold's question: step response of y in response to a a single unit step in x at t = 0, with tau = 10
would look as follows. Over the full simulation time, each change in x at a given time can be treated as an additional step change to be accumulated over time.
Update:
As suspected, I had been paying too much attention in all the wrong places. Eliminating the inner loop to turn this back to an O(n) problem saves 1-2 orders of magnitude on computation time, thanks Harold!
1 Answer 1
Speed
dx * scale * (1 - exp(-i / tau))
can be split up into two parts:
- A part that is accumulated without decay,
dx * scale
- And an exponentially decreasing part,
dx * scale * exp(-i / tau)
, which is subtracted from that.
Both of these can be written in an incremental form, keeping track of a couple of extra values across the loop to influence the elements at higher indices that way, instead of using an extra loop to "spread out" the influence explicitly.
double decayFactor = exp(-1.0 / tau);
double decaying = 0.0;
double accumulate = 0.0;
for (int i = 0; i < MAX_PROPAGATION_TIME - 1; i++){;
dx = x[i+1] - x[i];
accumulate += dx;
decaying = (decaying + dx) * decayFactor;
y[i] = (accumulate - decaying) * scale;
}
As far as I've tested it, the results are the same (close enough, apart from the usual floating point stuff), and it is very fast.
Miscellaneous
It is not necessary to "pre-declare" dx
, you can put double dx = x[i+1] - x[i]
right there in the loop.
Thanks to the way the faster implementation works, it also allows the computation to be done in-place (without allocating a new array y
), of course you may have other reasons not to do that.
-
\$\begingroup\$ Boom! Updated response with my new timings above, thanks for a fresh perspective! \$\endgroup\$Matt Summersgill– Matt Summersgill2022年11月18日 20:32:56 +00:00Commented Nov 18, 2022 at 20:32
stepResponse
outside of theupdate
function. it only depends on scale and tau which are constant across the SIM_TIME loop \$\endgroup\$SIM_TIME = 100
, what is the last run you want completed? \$\endgroup\$