Runge-Kutta 4th order using Python numexpr.evaluate()

Question 1

I am implementing an ODE solver, where the user provides rates and coefficients as a string. ODE solver has to work with vectors. The best implementation I got so far is the following:

# k_1 = dt*dcdt(C0, dt);
# k_2 = dt*dcdt(C0+0.5*k_1, dt);
# k_3 = dt*dcdt(C0+0.5*k_2, dt);
# k_4 = dt*dcdt(C0+k_3, dt);
# C_new = C0 + (k_1+2*k_2+2*k_3+k_4)/6;
import numpy as np
import numexpr as ne
dt = 0.1
coef = {'k': 1}
rates = {'R1': 'k*y1*y2'}
dcdt = {'y1': '-4 * R1', 'y2': '-R1'}
C0 = {'y2': np.random.rand(400), 'y1': np.random.rand(400)}
def k_loop(conc):
 rates_num = {}
 for k in rates:
 rates_num[k] = ne.evaluate(rates[k], {**coef, **conc})
 dcdt_num = {}
 for k in dcdt:
 dcdt_num[k] = ne.evaluate(dcdt[k], rates_num)
 Kn = {}
 for k in C0:
 Kn[k] = dt * dcdt_num[k]
 return Kn
def sum_k(A, B, b):
 C = {}
 for k in A:
 C[k] = A[k] + b * B[k] * dt
 return C
def result(C_0, k_1, k_2, k_3, k_4):
 C_new = {}
 for k in C_0:
 C_new[k] = C_0[k] + (k_1[k] + 2 * k_2[k] + 2 * k_3[k] + k_4[k]) / 6
 return C_new
k1 = k_loop(C0)
k2 = k_loop(sum_k(C0, k1, 0.5))
k3 = k_loop(sum_k(C0, k2, 0.5))
k4 = k_loop(sum_k(C0, k3, 1))
C_new = result(C0, k1, k2, k3, k4)

This ODE solver I am planning to use inside of the Object-oriented PDE solver. The PDE solver will provide C0 vectors. Therefore, the performance is crucial. I used numexpr.evaluate() for simplified user input of the rates. Can it be implemented in vectorized form? If you know the better implementation of the solver with such simplified input, please, let me know.

Metrics:

eval():

1000 loops, best of 3: 372 μs per loop

ne.evaluate():

1000 loops, best of 3: 377 μs per loop

@Graipher example:

1000 loops, best of 3: 452 μs per loop

Question 2

this is python 3.5 syntax.

Question 3

Does the use of np.array in C0 do anything for you? Why not just lists? I don't see any other use of numpy operations.

Question 4

Numpy arrays will be the inputs for this module from external PDE solver.

Question 5

As a general practice eval is considered to be dangerous and slow. ast.literal_eval is safer, though my impression is that it may be slower. sympy is another option if you math expressions. It might help if you gave examples of rates. Just how general do you need to go?

Question 6

Which are these 'vectors of size 300+ elements'? The match between that statement and your code is not obvious.

Question 7

I would make all objects numpy arrays and get the user to use index notation. They would have to write equations of the form R = np.array(['c[0]*y[0]*y[1]']) meaning that this will later evaluate just like your R = {'R1': 'c*y1*y2'}. Almost all functions can then take advantage of the fact that numpy does almost everything vectorized.

#!/usr/bin/env python3
# k_1 = dt*dcdt(C0, dt);
# k_2 = dt*dcdt(C0+0.5*k_1, dt);
# k_3 = dt*dcdt(C0+0.5*k_2, dt);
# k_4 = dt*dcdt(C0+k_3, dt);
# C_new = C0 + (k_1+2*k_2+2*k_3+k_4)/6;
import numpy as np
dt = 0.1
c = np.array([1])
y = np.array([[0, 0.1, 0.2, 0.3], [0, 0.1, 0.2, 0.3]])
R = np.array(['c[0]*y[0]*y[1]'])
dcdt = np.array(['-4 * R[0]', '-R[0]'])
def k_loop(y):
 rates_num = np.array([eval(k) for k in R])
 dcdt_num = np.array([eval(k, {'R': rates_num}) for k in dcdt])
 return dt * dcdt_num
def sum_k(A, B, b):
 return A + b * B * dt
def result(y_0, k_1, k_2, k_3, k_4):
 return y_0 + (k_1 + 2 * k_2 + 2 * k_3 + k_4) / 6
k1 = k_loop(y)
k2 = k_loop(sum_k(y, k1, 0.5))
k3 = k_loop(sum_k(y, k2, 0.5))
k4 = k_loop(sum_k(y, k3, 1))
y_new = result(y, k1, k2, k3, k4)
print(y_new)

Note that this is slightly slower for the small example given (0.09s vs 0.08s). Also the step rates_num = np.array([eval(k) for k in R]) looses some precision (up to 1E-4).

To see whether or not this performs better, a more realistic example would be needed.

The eval call can be back-substituted with ne.evaluate, I just didn't want to install it.

To calculate timings, I ran the result function 100000 times, like so:

for _ in range(100000):
 y = result(y, k1, k2, k3, k4)

When doing this with your original code, this takes 1.36s, while my code takes 0.99s.

However, when including updating the ks, this advantage goes away again:

for _ in range(10000):
 k1 = k_loop(y)
 k2 = k_loop(sum_k(y, k1, 0.5))
 k3 = k_loop(sum_k(y, k2, 0.5))
 k4 = k_loop(sum_k(y, k3, 1))
 y = result(y, k1, k2, k3, k4)

Note that this is ten times less often than before. Here your code takes 2.7s and my code takes 3.7s on my machine.

Question 8

I am getting an error NameError: name 'c' is not defined

Question 9

@IgorMarkelov Weird, it works for me. Are you copying the whole code or are you using it along with some other stuff? You might have to use rates_num = np.array([eval(k, {'c':c, 'y':y}) for k in R]) to make sure they are defined. Or are you not using the latest version of my answer? (I tried to do a vectorized eval, which is not a good idea)

Question 10

You are correct np.array([eval(k, {'c':c, 'y':y}) for k in R]) solved the problem...

Question 11

However, the performance is lower 1000 loops, best of 3: 449 µs per loop compared to my example 1000 loops, best of 3: 389 µs per loop for C0 = {'y1': np.random.rand(400,1), 'y2': np.random.rand(400,1)}.

Question 12

@IgorMarkelov Yes, it does seem to be slower on similar examples on my machine, as well.

Question 13

Not a thorough review but a Small contribution:

The dt term is multiplied by another constant input at each call of the sum_k function. (dt*0.5, etc).

You can get rid of the dt term in the sum_k function, and this would reduce the runtime performance a little bit.

Graipher Graipher 41.6k7 gold badges70 silver badges134 bronze badges · Answer 1 · 2017-01-20 15:49:08Z

I would make all objects numpy arrays and get the user to use index notation. They would have to write equations of the form R = np.array(['c[0]*y[0]*y[1]']) meaning that this will later evaluate just like your R = {'R1': 'c*y1*y2'}. Almost all functions can then take advantage of the fact that numpy does almost everything vectorized.

#!/usr/bin/env python3
# k_1 = dt*dcdt(C0, dt);
# k_2 = dt*dcdt(C0+0.5*k_1, dt);
# k_3 = dt*dcdt(C0+0.5*k_2, dt);
# k_4 = dt*dcdt(C0+k_3, dt);
# C_new = C0 + (k_1+2*k_2+2*k_3+k_4)/6;
import numpy as np
dt = 0.1
c = np.array([1])
y = np.array([[0, 0.1, 0.2, 0.3], [0, 0.1, 0.2, 0.3]])
R = np.array(['c[0]*y[0]*y[1]'])
dcdt = np.array(['-4 * R[0]', '-R[0]'])
def k_loop(y):
 rates_num = np.array([eval(k) for k in R])
 dcdt_num = np.array([eval(k, {'R': rates_num}) for k in dcdt])
 return dt * dcdt_num
def sum_k(A, B, b):
 return A + b * B * dt
def result(y_0, k_1, k_2, k_3, k_4):
 return y_0 + (k_1 + 2 * k_2 + 2 * k_3 + k_4) / 6
k1 = k_loop(y)
k2 = k_loop(sum_k(y, k1, 0.5))
k3 = k_loop(sum_k(y, k2, 0.5))
k4 = k_loop(sum_k(y, k3, 1))
y_new = result(y, k1, k2, k3, k4)
print(y_new)

Note that this is slightly slower for the small example given (0.09s vs 0.08s). Also the step rates_num = np.array([eval(k) for k in R]) looses some precision (up to 1E-4).

To see whether or not this performs better, a more realistic example would be needed.

The eval call can be back-substituted with ne.evaluate, I just didn't want to install it.

To calculate timings, I ran the result function 100000 times, like so:

for _ in range(100000):
 y = result(y, k1, k2, k3, k4)

When doing this with your original code, this takes 1.36s, while my code takes 0.99s.

However, when including updating the ks, this advantage goes away again:

for _ in range(10000):
 k1 = k_loop(y)
 k2 = k_loop(sum_k(y, k1, 0.5))
 k3 = k_loop(sum_k(y, k2, 0.5))
 k4 = k_loop(sum_k(y, k3, 1))
 y = result(y, k1, k2, k3, k4)

Note that this is ten times less often than before. Here your code takes 2.7s and my code takes 3.7s on my machine.

@IgorMarkelov Weird, it works for me. Are you copying the whole code or are you using it along with some other stuff? You might have to use rates_num = np.array([eval(k, {'c':c, 'y':y}) for k in R]) to make sure they are defined. Or are you not using the latest version of my answer? (I tried to do a vectorized eval, which is not a good idea)
You are correct np.array([eval(k, {'c':c, 'y':y}) for k in R]) solved the problem...
However, the performance is lower 1000 loops, best of 3: 449 µs per loop compared to my example 1000 loops, best of 3: 389 µs per loop for C0 = {'y1': np.random.rand(400,1), 'y2': np.random.rand(400,1)}.
@IgorMarkelov Yes, it does seem to be slower on similar examples on my machine, as well.

Gürkan Çetin Gürkan Çetin 2371 silver badge11 bronze badges · Answer 2 · 2017-05-26 22:45:10Z

Not a thorough review but a Small contribution:

The dt term is multiplied by another constant input at each call of the sum_k function. (dt*0.5, etc).

You can get rid of the dt term in the sum_k function, and this would reduce the runtime performance a little bit.

Stack Exchange Network

Runge-Kutta 4th order using Python numexpr.evaluate()

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Runge-Kutta 4th order using Python numexpr.evaluate()

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions