I am trying to plot heatmap with three variables x, y and z, each with vector length of 932826. The time taken with the function below is 28 seconds, with a bin size of 10 on each x and y. I have been using another program where the same plot happens in 6 - 7 seconds. Also, I know that the plot function of that program uses Python in the background
When debugging I see that the maximum time consumed is while using the griddata function. So, I was wondering if there are any alternatives that I can use instead of griddata to make the interpolation quicker
In future I will be handling even bigger data (8 to 10 times the size of current one), so I am in need to finding something robust, which handles all the data, without going into memory overflow
import numpy as np
import matplotlib.pyplot as plt
import time
from scipy.interpolate import griddata
import os
arr_len = 932826
x = np.random.uniform(low=0, high=4496, size=arr_len)
y = np.random.uniform(low=-74, high=492, size=arr_len)
z = np.random.uniform(low=-30, high=97, size=arr_len)
x_linspace = 10
y_linspace = 10
x_lab = 'x_label'
y_lab = 'y_label'
title = 'some title'
plot_name = 'example plot'
xi = np.linspace(min(x), max(x), x_linspace)
yi = np.linspace(min(y), max(y), y_linspace)
start = time.time()
zi = griddata((x, y), z, (xi[None, :], yi[:, None]), method='linear')
print('Time elapsed = ', time.time()-start)
-
\$\begingroup\$ Do you have access to a GPU or to multiples cores to accelerate the execution? \$\endgroup\$IEatBagels– IEatBagels2021年05月21日 13:33:47 +00:00Commented May 21, 2021 at 13:33
-
\$\begingroup\$ @IEatBagels : I am not really sure. How can I find out that ? I tried using interp2d from scipy library, but it seems to give memory overflow error \$\endgroup\$Masoom Kumar– Masoom Kumar2021年05月21日 15:45:20 +00:00Commented May 21, 2021 at 15:45
-
\$\begingroup\$ Line profile the griddata function's code as well. I'm not sure about that particular case, but I've seen long ago cases where error-checking the inputs consume most of a function's run time. IIRC it was np.histogram() vs. np.searchsorted(), back in Python 2 days. Anyway, if something like that is happening for you, modifying griddata to avoid type checking would be easy. \$\endgroup\$Curt F.– Curt F.2025年06月11日 20:05:06 +00:00Commented Jun 11 at 20:05
1 Answer 1
It isn't straightforward. Your example suggests that x, y are irregular; is that actually true? If not then more efficient solutions are available. I proceed by assuming that they are irregular.
You describe that this is for plotting, which means that some amount of quantisation loss is acceptable. The only "easy" way I can think of to make this more efficient is a multi-step process where first, you convert from an irregular coordinate space to a regular coordinate space using nearest neighbour, which is much faster even than linear methods. When doing this, you need to choose between accuracy and speed. Higher pixel counts will be more accurate but slower.
After that first step, you can immediately plot at the intermediate resolution - and maybe that's useful to you, but maybe it isn't.
Your final resolution of 10 pixels by 10 pixels is bizarre. Do you really want to plot at this low of a resolution? If so, performance is not challenging but accuracy requires being careful about aliasing. RegularGridInterpolator
or scipy.ndimage.zoom()
methods for instance would quickly produce the correct resolution but have awful aliasing that will produce essentially meaningless output. You need anti-aliasing of the kind that Scikit offers.
import contextlib
import logging
import time
import typing
import numpy as np
import skimage
from matplotlib import pyplot as plt
from scipy.interpolate import NearestNDInterpolator
logger = logging.getLogger('interp')
@contextlib.contextmanager
def log_time(title: str) -> typing.Iterator:
t0 = time.perf_counter()
try:
yield
finally:
logger.info('%s: %.4f s', title, time.perf_counter() - t0)
def nearest_downsample(
xyz: np.ndarray, # (npoints, 3)
pixels: int = 100_000, # in the output array; approximate
) -> tuple[
np.ndarray, # regularised x (n,)
np.ndarray, # regularised y (m,)
np.ndarray, # regularised z (m,n)
]:
"""
Downsample a (potentially large, irregular) array using fast nearest-neighbour sampling.
The output coordinates are regular.
"""
xy = xyz[:, :2]
z = xyz[:, 2]
xmin, ymin = xy.min(axis=0)
xmax, ymax = xy.max(axis=0)
with log_time('Nearest interpolator construction'):
interpolate = NearestNDInterpolator(xy, z)
# m/n = aspect
# m*n = pixels
# aspect*n = pixels/n
aspect = (ymax - ymin)/(xmax - xmin)
n = round(np.sqrt(pixels/aspect))
m = round(aspect*n)
yi = np.linspace(start=ymin, stop=ymax, num=m)
xi = np.linspace(start=xmin, stop=xmax, num=n)
xx, yy = np.meshgrid(xi, yi, sparse=True)
with log_time('Nearest interpolation'):
return xi, yi, interpolate(xx, yy)
def demo() -> None:
rand = np.random.default_rng(seed=0)
xyz_irregular = rand.uniform(
low=(0, -74, -30),
high=(4496, 492, 97),
size=(932_826, 3),
)
# Slow, but nowhere near as slow as doing a linear interpolation on the irregular data
xr, yr, zzr = nearest_downsample(xyz_irregular)
extents = (xr[0], xr[-1], yr[0], yr[-1])
fig, ax = plt.subplots()
with log_time('Intermediate-resolution plot'):
ax.set_title('Intermediate nearest-neighbour sampling')
image = ax.imshow(zzr, aspect='auto', extent=extents)
fig.colorbar(mappable=image, ax=ax)
# From here RegularGridInterpolator would be accessible and fast but the output would be
# completely meaningless due to aliasing. Use a resize with anti-aliasing instead.
with log_time('Spline resize'):
out = skimage.transform.resize(
image=zzr, output_shape=(10, 12), order=1, anti_aliasing=True,
)
fig, ax = plt.subplots()
ax.set_title('Spline resize with anti-aliasing')
image = ax.imshow(out, aspect='auto', extent=extents)
fig.colorbar(mappable=image, ax=ax)
plt.show()
if __name__ == '__main__':
logging.basicConfig(level=logging.INFO)
demo()
INFO:interp:Nearest interpolator construction: 0.3400 s
INFO:interp:Nearest interpolation: 0.1031 s
INFO:interp:Intermediate-resolution plot: 0.0096 s
INFO:interp:Spline resize: 0.0384 s