I am using the least_squares()
function from the scipy.optimize
module to calibrate a Canopy structural dynamic model (CSDM). The calibrated model is then used to predict leaf area index (LAI) based on thermal time (tt) data. I tried two variants; the first did not use the loss
parameter of the least_squares()
function, while the second set this parameter to produce a robust model. Both models give me runtime warnings even though the optimization completes successfully.
With the simple least-squares model I get these warnings:
__main__:24: RuntimeWarning: overflow encountered in exp
__main__:24: RuntimeWarning: divide by zero encountered in true_divide
With the robust least-squares model I get these warnings:
__main__:24: RuntimeWarning: overflow encountered in exp
__main__:24: RuntimeWarning: overflow encountered in power
# Canopy structural dynamic model (CSDM) implementation using scipy.optimize
import numpy as np
from scipy.optimize import least_squares
import matplotlib.pyplot as plt
#independent variable training data
tt_train = np.array([299.135, 408.143, 736.124, 1023.94, 1088.47, 1227.22,
1313.94, 1392.93, 1482.83, 1581.96, 2064.27, 2277.95, 2394.62, 2519.23])
#dependent variable training data
lai_train = np.array([0.0304313, 0.0833402, 0.682014, 0.973261, 2.54978,
4.93747, 5.31949, 6.25236, 6.64175, 7.3717, 3.61623, 2.96673, 1.72345, 0.803591])
# The CSDM formula (Duveiller et al. 2011. Retrieving wheat Green Area Index during the growing season...)
# LAI = k * (1 / ((1 + Exp(-a * (tt - T0 - Ta))) ^ c) - Exp(b * (tt - T0 - Tb)))
# initial estimates of parameters
To = 50 # plant emergence (x[0])
Ta = 1000 # midgrowth (x[1])
Tb = 2000 # end of cenescence (x[2])
k = 6 # scaling factor (arox. max LAI) (x[3])
a = 0.01 # rate of growth (x[4])
b = 0.01 # rate of senescence (x[5])
c = 1 # parameter allowing some plasticity to the shape of the curv (x[6])
x0 = np.array([To, Ta, Tb, k, a, b, c])
def model(x, tt):
return x[3] * (1 / ((1 + np.exp(-x[4] * (tt - x[0] - x[1]))) ** x[6]) - np.exp(x[5] * (tt - x[0] - x[2])))
#Define the function computing residuals for least-squares minimization
def fun(x, tt, lai):
return model(x, tt) - lai
#simple model
res_lsq = least_squares(fun, x0, args=(tt_train, lai_train))
#robust model
res_robust = least_squares(fun, x0, loss='soft_l1', f_scale=1, args=(tt_train, lai_train))
# termal time data for full season
tt_test = np.array([11.7584,22.1838,34.0008,47.7174,64.3092,81.1832,90.1728,101.494,116.125,127.732,140.229,
154.381,170.5,185.707,201.368,217.642,233.593,249.703,266.233,283.074,299.135,314.386,327.024,337.58,344.699,
354.328,367.247,379.627,391.51,400.93,408.143,414.941,423.678,433.2,442.072,448.923,454.699,462.479,471.187,
481.93,492.389,499.845,508.979,522.702,533.663,540.178,547.342,553.534,560.451,569.112,574.813,580.323,
589.95,597.542,601.937,606.161,609.48,613.321,615.876,619.44,623.754,630,636.784,640.978,643.625,646.384,
650.608,657.538,664.192,670.672,673.271,674.191,679.735,685.526,694.327,700.824,710.817,714.799,717.233,
718.539,718.669,718.669,718.669,719.985,726.038,736.124,740.441,745.865,751.463,757.85,761.474,763.216,
769.154,772.596,778.288,782.517,785.868,791.79,798.324,803.554,806.697,809.536,813.457,817.2,817.902,
817.902,817.902,817.902,817.902,820.271,824.126,826.609,826.668,827.619,827.619,827.629,827.629,827.629,
827.629,827.629,833.344,841.854,849.289,854.49,859.806,871.709,878.918,882.926,885.63,888.126,892.953,
898.661,899.547,900.031,903.327,906.253,909.183,912.358,917.222,921.757,925.36,927.341,927.819,929.745,
930.731,930.949,932.384,932.384,932.384,932.384,932.384,932.384,932.384,933.757,933.757,933.757,936.283,
940.396,945.01,952.758,961.418,973.865,986.804,999.508,1012.5,1023.94,1034.92,1048.68,1052.39,1052.39,
1052.8,1053.73,1053.73,1053.73,1054.09,1054.31,1056.48,1061.43,1068.88,1076.67,1088.47,1104.89,1119.38,
1130.99,1141.1,1155.06,1171.19,1185.48,1199.21,1213.17,1227.22,1242.87,1260.89,1277.97,1295.61,1313.94,
1331.04,1346.59,1359.13,1375.4,1392.93,1408.89,1424.56,1442.76,1461.92,1482.83,1502.78,1523.67,1544.39,
1563.29,1581.96,1599.23,1619.32,1637.81,1656.31,1678.33,1700.06,1721.59,1741.63,1761.09,1779.76,1799.04,
1818.54,1836.93,1855.25,1871.02,1890.62,1909.82,1928.01,1946.39,1966.27,1983.82,2003.26,2023.74,2043.92,
2064.27,2085.74,2107.14,2127.92,2148.44,2167.92,2188.01,2208.63,2231.33,2254.54,2277.95,2301.32,2323.56,
2347.52,2370.52,2394.62,2419.89,2442.6,2466.69,2492.51,2519.23,2540.78,2563.82,2585.14,2607.89,2628.95,
2652.57,2676.55,2700.73,2724,2742.09,2759.06,2778.77,2798.12,2815.01,2834.76,2855.37,2878.56])
# apply the two models to the full season data
lai_lsq = model(res_lsq.x, tt_test)
lai_robust = model(res_robust.x, tt_test)
# plot the two model fits
plt.plot(tt_train, lai_train, 'o', markersize=4, label='training data')
plt.plot(tt_test, lai_lsq, label='fitted lsq model')
plt.plot(tt_test, lai_robust, label='fitted robust model')
plt.xlabel("tt")
plt.ylabel("LAI")
plt.legend(loc='upper left')
plt.show()
Here is an image showing the fitted lines for the two models. The simple lsq model seems OK but these overflow warnings may indicate serious problems that compromise the algorithm (especially divide by zero). The robust model on the other hand is totally wrong.
1 Answer 1
Convex optimiser
This class of problem is very sensitive to initial conditions if you aren't careful. It has significant degrees of freedom and multiple exponential terms. Your provided estimate is very poor, so you're setting up a least-squares solver for failure and it often converges to a local minimum.
You could cheat and give it a better estimate, or you could improve the problem construction such that it's better able to converge on a good minimum. Let's do a rough superparametric search for solutions with cost below 5.0 while still carrying the stress of a poor estimate:
ress = []
for method in ('trf', 'dogbox', 'lm'):
for loss in ('linear', 'soft_l1', 'huber', 'cauchy', 'arctan'):
if method == 'lm' and loss != 'linear':
continue
for jac in ('2-point', '3-point', 'cs', jacobian):
for xscale in (1, 1000, 'jac'):
start = time.perf_counter_ns()
try:
res = least_squares(
residuals, x0, method=method, loss=loss, args=args,
f_scale=1, x_scale=xscale, jac=jac,
)
except ValueError as e:
print(f'method={method} loss={loss} jac={jac} xscale={xscale}: {e}')
continue
dur = time.perf_counter_ns() - start
if res.success:
ress.append((
method, loss, str(jac), xscale, res.nfev, res.cost, dur, res.x,
))
method loss jac xscale nfev cost dur
110 lm linear <function jacobian at 0x000001775B97E8E0> jac 68 1.421082 3.5439
23 trf soft_l1 <function jacobian at 0x000001775B97E8E0> jac 31 1.268718 7.1395
35 trf huber <function jacobian at 0x000001775B97E8E0> jac 36 1.421082 9.4367
47 trf cauchy <function jacobian at 0x000001775B97E8E0> jac 44 1.124089 9.8002
92 dogbox cauchy <function jacobian at 0x000001775B97E8E0> jac 39 1.124089 10.3736
20 trf soft_l1 cs jac 31 1.268718 11.1539
14 trf soft_l1 2-point jac 31 1.268718 11.3894
101 dogbox arctan <function jacobian at 0x000001775B97E8E0> jac 45 1.258756 11.9962
100 dogbox arctan <function jacobian at 0x000001775B97E8E0> 1000 49 1.258756 12.8161
75 dogbox soft_l1 <function jacobian at 0x000001775B97E8E0> jac 45 1.268718 12.8408
17 trf soft_l1 3-point jac 31 1.268718 12.9511
99 dogbox arctan <function jacobian at 0x000001775B97E8E0> 1 49 1.258756 12.9608
26 trf huber 2-point jac 39 1.421082 14.2338
38 trf cauchy 2-point jac 44 1.124089 15.1752
32 trf huber cs jac 36 1.421082 15.2485
91 dogbox cauchy cs jac 39 1.124089 15.6048
44 trf cauchy cs jac 44 1.124089 16.1419
29 trf huber 3-point jac 37 1.421082 17.4646
41 trf cauchy 3-point jac 44 1.124089 19.0798
98 dogbox arctan cs jac 45 1.258756 19.6480
97 dogbox arctan cs 1000 49 1.258756 20.0157
96 dogbox arctan cs 1 49 1.258756 20.3347
74 dogbox soft_l1 cs jac 45 1.268718 20.6668
93 dogbox arctan 3-point 1 51 1.258756 22.9243
94 dogbox arctan 3-point 1000 51 1.258756 23.1275
90 dogbox cauchy 3-point jac 57 1.124089 26.2926
87 dogbox huber <function jacobian at 0x000001775B97E8E0> jac 127 1.421082 40.8136
10 trf linear <function jacobian at 0x000001775B97E8E0> 1000 294 1.421082 47.3578
49 trf arctan 2-point 1000 140 2.811160 53.0532
46 trf cauchy <function jacobian at 0x000001775B97E8E0> 1000 267 1.124091 56.3212
45 trf cauchy <function jacobian at 0x000001775B97E8E0> 1 267 1.124091 56.3979
9 trf linear <function jacobian at 0x000001775B97E8E0> 1 339 1.421082 57.2912
48 trf arctan 2-point 1 156 2.811143 59.9609
50 trf arctan 2-point jac 158 3.509310 60.1457
84 dogbox huber cs jac 130 1.421082 63.6184
81 dogbox huber 3-point jac 112 1.429134 67.0705
95 dogbox arctan 3-point jac 144 1.259296 79.6546
58 trf arctan <function jacobian at 0x000001775B97E8E0> 1000 434 2.811056 87.8679
57 trf arctan <function jacobian at 0x000001775B97E8E0> 1 434 2.811056 88.9529
6 trf linear cs 1 298 1.421082 91.7341
59 trf arctan <function jacobian at 0x000001775B97E8E0> jac 452 3.509153 95.3699
0 trf linear 2-point 1 314 1.421082 95.7506
37 trf cauchy 2-point 1000 267 1.124091 96.3333
36 trf cauchy 2-point 1 267 1.124091 96.3350
42 trf cauchy cs 1 267 1.124091 100.9315
43 trf cauchy cs 1000 267 1.124091 102.7517
1 trf linear 2-point 1000 342 1.421083 106.0324
7 trf linear cs 1000 356 1.421082 115.9012
39 trf cauchy 3-point 1 267 1.124091 117.6201
40 trf cauchy 3-point 1000 267 1.124091 119.1155
53 trf arctan 3-point jac 276 3.509171 127.6806
52 trf arctan 3-point 1000 319 2.811063 148.6462
54 trf arctan cs 1 434 2.811056 160.5581
55 trf arctan cs 1000 434 2.811056 160.6625
56 trf arctan cs jac 452 3.509153 170.2620
51 trf arctan 3-point 1 370 2.811058 173.1088
3 trf linear 3-point 1 473 1.421082 176.2919
4 trf linear 3-point 1000 527 1.421082 201.4859
We can see that:
- all of the top-performing methods use an explicit analytic Jacobian (
jac
), and use that Jacobian to adapt the domain scale (x_scale
) - Levenberg–Marquardt outperforms all other approaches by a significant margin in terms of runtime, but does not support nonlinear loss functions or bounds
- If you keep using the Trust Region-Reflective (TRF) method of Branch, Coleman, and Li (1999), "A Subspace, Interior, and Conjugate Gradient Method for Large-Scale Bound-Constrained Minimization Problems", SIAM Journal on Scientific Computing 21(1) then regardless of choice of loss function, so long as the changes above are included, it converges
Since it was the focus of your question, and because it supports bounds, let's stick with TRF.
Jacobian
The analytic Jacobian will look like
To, Ta, Tb, k, a, b, c = x
ea = np.exp(-a*(tt - To - Ta))
eb = np.exp(b*(tt - To - Tb))
jac = np.stack((
k*(b*eb - a*c*ea*(ea + 1)**(-1 - c)),
-k*a*c*ea*(ea + 1)**(-1 - c),
k*b*eb,
(ea + 1)**(-c) - eb,
k*c*ea*(tt - To - Ta)*(ea + 1)**(-1 - c),
k*eb*(To + Tb - tt),
-k*(ea + 1)**(-c)*np.log(ea + 1),
), axis=1)
It takes a few minutes to write out, and you should always check the results at least once via check_grad, but it's worth it for accuracy, stability and speed of convergence.
Domain errors
Often this will spit out lines like
236103.py:18: RuntimeWarning: divide by zero encountered in divide
1/(1 + np.exp(-a*(tt - To - Ta)))**c
236103.py:19: RuntimeWarning: overflow encountered in exp
- np.exp(b*(tt - To - Tb))
This calls for bounds. First apply
import warnings
warnings.filterwarnings("error")
to trace to the domain failure; the first one I hit is for the divide-by-zero:
'Ta': np.float64(1474.02037868117),
'Tb': np.float64(2081.338628399358),
'To': np.float64(131.34409097929301),
'a': np.float64(0.11494872392419284),
'b': np.float64(0.008762243913225441),
'c': np.float64(-8.342277869893858),
'k': np.float64(3.1529765675397257),
This occurs because c
is negative. Constrain it to be positive. The next hit is for the exp
overflow:
'Ta': np.float64(999.7849029153789),
'Tb': np.float64(2493.576062291977),
'To': np.float64(161.02759530721556),
'a': np.float64(0.010046551405603055),
'b': np.float64(-0.43782467478251763),
'c': np.float64(0.9987984632695268),
'k': np.float64(5.874423585658046),
b*(tt - To - tb)
is very large, because b
is negative. Constrain it to be positive.
You can either repeat this process ad nauseam until everything is happy, or simply add sane bounds to all of your variables, something like
bounds = Bounds(
lb=( 10, 100, 100, 1, 1e-3, 1e-3, 1e-3),
ub=( 1e3, 1e4, 5e3, 100, 1, 1, 1),
)
At this point, every cost function should converge with no domain errors:
API
You should not be calling least_squares
directly, nor calculating your own residuals. Instead, call the curve_fit
wrapper. residuals()
will go away and you won't have to unpack x
in your model and Jacobian.
Demonstration
All together,
#!/usr/bin/env python3
# Canopy structural dynamic model (CSDM) implementation using scipy.optimize
import warnings
import numpy as np
from scipy.optimize import least_squares, check_grad, Bounds, curve_fit
import matplotlib.pyplot as plt
def model(
tt: np.ndarray,
To: float, Ta: float, Tb: float, k: float,
a: float, b: float, c: float,
) -> np.ndarray:
"""
The CSDM formula (Duveiller et al. 2011. Retrieving wheat Green Area Index during the growing season...)
LAI = k * (1 / ((1 + Exp(-a * (tt - T0 - Ta))) ^ c) - Exp(b * (tt - T0 - Tb)))
"""
return k*(
1/(1 + np.exp(-a*(tt - To - Ta)))**c
- np.exp(b*(tt - To - Tb))
)
def model_packed(x: np.ndarray, tt: np.ndarray) -> np.ndarray:
return model(tt, *x)
def jacobian(
tt: np.ndarray,
To: float, Ta: float, Tb: float, k: float,
a: float, b: float, c: float,
) -> np.ndarray:
ea = np.exp(-a*(tt - To - Ta))
eb = np.exp(b*(tt - To - Tb))
jac = np.stack((
k*(b*eb - a*c*ea*(ea + 1)**(-1 - c)),
-k*a*c*ea*(ea + 1)**(-1 - c),
k*b*eb,
(ea + 1)**(-c) - eb,
k*c*ea*(tt - To - Ta)*(ea + 1)**(-1 - c),
k*eb*(To + Tb - tt),
-k*(ea + 1)**(-c)*np.log(ea + 1),
), axis=1)
# Compare to:
# from scipy.optimize._optimize import _epsilon
# appr = approx_fprime(x, model, _epsilon, tt)
return jac
def jacobian_packed(x: np.ndarray, tt: np.ndarray) -> np.ndarray:
return jacobian(tt, *x)
def solve(
tt_train: np.ndarray,
lai_train: np.ndarray,
validate: bool = True,
fatal_float: bool = False,
) -> tuple[np.ndarray, np.ndarray]:
if fatal_float:
warnings.filterwarnings('error')
# initial estimates of parameters: good
To = 300 # plant emergence
Ta = 1000 # midgrowth
Tb = 2000 # end of cenescence
k = 10 # scaling factor (arox. max LAI)
a = 0.01 # rate of growth
b = 0.001 # rate of senescence
c = 0.2 # parameter allowing some plasticity to the shape of the curve
# initial estimates of parameters: bad
# To = 50
# Ta = 1000
# Tb = 2000
# k = 6
# a = 0.01
# b = 0.01
# c = 1
x0 = np.array((To, Ta, Tb, k, a, b, c))
if validate:
error = check_grad(
model_packed, jacobian_packed, x0, tt_train,
)/np.linalg.norm(jacobian(tt_train, *x0))
assert error < 1e-5
popt, pcov, infodict, mesg, ier = curve_fit(
f=model, jac=jacobian, p0=x0, xdata=tt_train, ydata=lai_train,
bounds=Bounds(
lb=( 10, 100, 100, 1, 1e-3, 1e-3, 1e-3),
ub=( 1e3, 1e4, 5e3, 100, 1, 1, 1),
),
method='trf', loss='soft_l1', x_scale='jac', full_output=True,
)
if not (1 <= ier <= 4):
raise ValueError(mesg)
return x0, popt
def plot(
tt_train: np.ndarray, tt_test: np.ndarray, lai_train: np.ndarray,
x0: np.ndarray, x: np.ndarray,
) -> plt.Figure:
fig, ax = plt.subplots()
ax.set_xlabel('tt')
ax.set_ylabel('LAI')
ax.scatter(tt_train, lai_train, label='training data')
ax.plot(tt_test, model(tt_test, *x0), label='estimate')
ax.plot(tt_test, model(tt_test, *x), label='full season')
ax.legend()
return fig
def demo() -> None:
# independent variable training data
tt_train = np.array((
299.135, 408.143, 736.124, 1023.94, 1088.47, 1227.22,
1313.94, 1392.93, 1482.83, 1581.96, 2064.27, 2277.95, 2394.62, 2519.23,
))
# dependent variable training data
lai_train = np.array((
0.0304313, 0.0833402, 0.682014, 0.973261, 2.54978,
4.93747, 5.31949, 6.25236, 6.64175, 7.3717, 3.61623, 2.96673, 1.72345, 0.803591,
))
# termal time data for full season
tt_test = np.array((
11.7584,22.1838,34.0008,47.7174,64.3092,81.1832,90.1728,101.494,116.125,127.732,140.229,
154.381,170.5,185.707,201.368,217.642,233.593,249.703,266.233,283.074,299.135,314.386,327.024,337.58,344.699,
354.328,367.247,379.627,391.51,400.93,408.143,414.941,423.678,433.2,442.072,448.923,454.699,462.479,471.187,
481.93,492.389,499.845,508.979,522.702,533.663,540.178,547.342,553.534,560.451,569.112,574.813,580.323,
589.95,597.542,601.937,606.161,609.48,613.321,615.876,619.44,623.754,630,636.784,640.978,643.625,646.384,
650.608,657.538,664.192,670.672,673.271,674.191,679.735,685.526,694.327,700.824,710.817,714.799,717.233,
718.539,718.669,718.669,718.669,719.985,726.038,736.124,740.441,745.865,751.463,757.85,761.474,763.216,
769.154,772.596,778.288,782.517,785.868,791.79,798.324,803.554,806.697,809.536,813.457,817.2,817.902,
817.902,817.902,817.902,817.902,820.271,824.126,826.609,826.668,827.619,827.619,827.629,827.629,827.629,
827.629,827.629,833.344,841.854,849.289,854.49,859.806,871.709,878.918,882.926,885.63,888.126,892.953,
898.661,899.547,900.031,903.327,906.253,909.183,912.358,917.222,921.757,925.36,927.341,927.819,929.745,
930.731,930.949,932.384,932.384,932.384,932.384,932.384,932.384,932.384,933.757,933.757,933.757,936.283,
940.396,945.01,952.758,961.418,973.865,986.804,999.508,1012.5,1023.94,1034.92,1048.68,1052.39,1052.39,
1052.8,1053.73,1053.73,1053.73,1054.09,1054.31,1056.48,1061.43,1068.88,1076.67,1088.47,1104.89,1119.38,
1130.99,1141.1,1155.06,1171.19,1185.48,1199.21,1213.17,1227.22,1242.87,1260.89,1277.97,1295.61,1313.94,
1331.04,1346.59,1359.13,1375.4,1392.93,1408.89,1424.56,1442.76,1461.92,1482.83,1502.78,1523.67,1544.39,
1563.29,1581.96,1599.23,1619.32,1637.81,1656.31,1678.33,1700.06,1721.59,1741.63,1761.09,1779.76,1799.04,
1818.54,1836.93,1855.25,1871.02,1890.62,1909.82,1928.01,1946.39,1966.27,1983.82,2003.26,2023.74,2043.92,
2064.27,2085.74,2107.14,2127.92,2148.44,2167.92,2188.01,2208.63,2231.33,2254.54,2277.95,2301.32,2323.56,
2347.52,2370.52,2394.62,2419.89,2442.6,2466.69,2492.51,2519.23,2540.78,2563.82,2585.14,2607.89,2628.95,
2652.57,2676.55,2700.73,2724,2742.09,2759.06,2778.77,2798.12,2815.01,2834.76,2855.37,2878.56,
))
x0, x = solve(tt_train=tt_train, lai_train=lai_train)
print(x)
plot(
tt_train=tt_train, lai_train=lai_train, tt_test=tt_test, x0=x0, x=x,
)
plt.show()
if __name__ == '__main__':
demo()
[3.43106103e+02 1.01526154e+03 2.22302919e+03 9.30097207e+00
1.50234233e-02 1.29006027e-03 2.13314095e-01]
np.exp(-x[4] * (tt - x[0] - x[1]))) ** x[6])
does not sound so unexpected,x[6]
does not even need to be very large, depending on the other parameters, e.g.np.exp(2**10) = inf
. And then1./np.exp(-2**10)
is a divide by zero. \$\endgroup\$bounds=([1, 500, 1500, 3, 0.001, 0.001, 0.5], [50, 2000, 3000, 9, 0.1, 0.1, 1.5])
. \$\endgroup\$