I am facing some strange timings with the RegularGridInterpolator from scipy.
Splitting the interpolation into 100 chunks and concatenating the final result seems to be faster for large arrays:
When interpolating 1,000,000 points in 5D, for instance, the chunked interpolating is almost twice as fast as interpolating all in one go.
from scipy.interpolate import RegularGridInterpolator
import numpy as np
import time
N = [10, 11, 12, 13, 14]
grid = [np.linspace(i, i + 1, n) for i, n in enumerate(N, 0)]
values = np.arange(np.prod(N)).reshape(N)
rgi = RegularGridInterpolator(grid, values)
for N in [10000, 100000, 1000000]:
xp = np.array([np.random.random(N) + i for i in range(5)]).T
t = time.time()
r1 = rgi(xp)
t1 = time.time() - t
t = time.time()
r2 = np.concatenate([rgi(xp_) for xp_ in np.array_split(xp, 100)])
t2 = time.time() - t
print(f'{N}: {t1:.3f}, 100x{N/100}: {t2:.3f}')
Output
10000: 0.009, 100x100.0: 0.051
100000: 0.087, 100x1000.0: 0.067
1000000: 1.094, 100x10000.0: 0.594
Versions
Python 3.13.9
numpy 2.3.4
pip 25.3
scipy 1.16.3
1 Answer 1
This happens because RegularGridInterpolator’s vectorized evaluation becomes memory-bound for large inputs — the overhead of allocating huge intermediate arrays dominates the runtime.
When you split the input into smaller chunks, NumPy’s temporary arrays stay smaller and fit better in CPU cache, so you get less memory pressure and better throughput.
In short:
✅ Small chunks → better cache locality, less memory overhead
❌ One big array → vectorization overhead + cache misses
If you want both speed and simplicity, use chunking proportional to available RAM (e.g. np.array_split(xp, max(1, len(xp)//5000))).
time.perf_counter()for better accuracy.time.time()is more suited for general timekeeping.