6
\$\begingroup\$

I am trying to plot heatmap with three variables x, y and z, each with vector length of 932826. The time taken with the function below is 28 seconds, with a bin size of 10 on each x and y. I have been using another program where the same plot happens in 6 - 7 seconds. Also, I know that the plot function of that program uses Python in the background

When debugging I see that the maximum time consumed is while using the griddata function. So, I was wondering if there are any alternatives that I can use instead of griddata to make the interpolation quicker

In future I will be handling even bigger data (8 to 10 times the size of current one), so I am in need to finding something robust, which handles all the data, without going into memory overflow

import numpy as np
import matplotlib.pyplot as plt
import time
from scipy.interpolate import griddata
import os
arr_len = 932826
x = np.random.uniform(low=0, high=4496, size=arr_len)
y = np.random.uniform(low=-74, high=492, size=arr_len)
z = np.random.uniform(low=-30, high=97, size=arr_len)
x_linspace = 10
y_linspace = 10
x_lab = 'x_label'
y_lab = 'y_label'
title = 'some title'
plot_name = 'example plot'
xi = np.linspace(min(x), max(x), x_linspace)
yi = np.linspace(min(y), max(y), y_linspace)
start = time.time()
zi = griddata((x, y), z, (xi[None, :], yi[:, None]), method='linear')
print('Time elapsed = ', time.time()-start)
asked May 21, 2021 at 13:16
\$\endgroup\$
3
  • \$\begingroup\$ Do you have access to a GPU or to multiples cores to accelerate the execution? \$\endgroup\$ Commented May 21, 2021 at 13:33
  • \$\begingroup\$ @IEatBagels : I am not really sure. How can I find out that ? I tried using interp2d from scipy library, but it seems to give memory overflow error \$\endgroup\$ Commented May 21, 2021 at 15:45
  • \$\begingroup\$ Line profile the griddata function's code as well. I'm not sure about that particular case, but I've seen long ago cases where error-checking the inputs consume most of a function's run time. IIRC it was np.histogram() vs. np.searchsorted(), back in Python 2 days. Anyway, if something like that is happening for you, modifying griddata to avoid type checking would be easy. \$\endgroup\$ Commented Jun 11 at 20:05

1 Answer 1

4
\$\begingroup\$

It isn't straightforward. Your example suggests that x, y are irregular; is that actually true? If not then more efficient solutions are available. I proceed by assuming that they are irregular.

You describe that this is for plotting, which means that some amount of quantisation loss is acceptable. The only "easy" way I can think of to make this more efficient is a multi-step process where first, you convert from an irregular coordinate space to a regular coordinate space using nearest neighbour, which is much faster even than linear methods. When doing this, you need to choose between accuracy and speed. Higher pixel counts will be more accurate but slower.

After that first step, you can immediately plot at the intermediate resolution - and maybe that's useful to you, but maybe it isn't.

Your final resolution of 10 pixels by 10 pixels is bizarre. Do you really want to plot at this low of a resolution? If so, performance is not challenging but accuracy requires being careful about aliasing. RegularGridInterpolator or scipy.ndimage.zoom() methods for instance would quickly produce the correct resolution but have awful aliasing that will produce essentially meaningless output. You need anti-aliasing of the kind that Scikit offers.

import contextlib
import logging
import time
import typing
import numpy as np
import skimage
from matplotlib import pyplot as plt
from scipy.interpolate import NearestNDInterpolator
logger = logging.getLogger('interp')
@contextlib.contextmanager
def log_time(title: str) -> typing.Iterator:
 t0 = time.perf_counter()
 try:
 yield
 finally:
 logger.info('%s: %.4f s', title, time.perf_counter() - t0)
def nearest_downsample(
 xyz: np.ndarray, # (npoints, 3)
 pixels: int = 100_000, # in the output array; approximate
) -> tuple[
 np.ndarray, # regularised x (n,)
 np.ndarray, # regularised y (m,)
 np.ndarray, # regularised z (m,n)
]:
 """
 Downsample a (potentially large, irregular) array using fast nearest-neighbour sampling.
 The output coordinates are regular.
 """
 xy = xyz[:, :2]
 z = xyz[:, 2]
 xmin, ymin = xy.min(axis=0)
 xmax, ymax = xy.max(axis=0)
 with log_time('Nearest interpolator construction'):
 interpolate = NearestNDInterpolator(xy, z)
 # m/n = aspect
 # m*n = pixels
 # aspect*n = pixels/n
 aspect = (ymax - ymin)/(xmax - xmin)
 n = round(np.sqrt(pixels/aspect))
 m = round(aspect*n)
 yi = np.linspace(start=ymin, stop=ymax, num=m)
 xi = np.linspace(start=xmin, stop=xmax, num=n)
 xx, yy = np.meshgrid(xi, yi, sparse=True)
 with log_time('Nearest interpolation'):
 return xi, yi, interpolate(xx, yy)
def demo() -> None:
 rand = np.random.default_rng(seed=0)
 xyz_irregular = rand.uniform(
 low=(0, -74, -30),
 high=(4496, 492, 97),
 size=(932_826, 3),
 )
 # Slow, but nowhere near as slow as doing a linear interpolation on the irregular data
 xr, yr, zzr = nearest_downsample(xyz_irregular)
 extents = (xr[0], xr[-1], yr[0], yr[-1])
 fig, ax = plt.subplots()
 with log_time('Intermediate-resolution plot'):
 ax.set_title('Intermediate nearest-neighbour sampling')
 image = ax.imshow(zzr, aspect='auto', extent=extents)
 fig.colorbar(mappable=image, ax=ax)
 # From here RegularGridInterpolator would be accessible and fast but the output would be
 # completely meaningless due to aliasing. Use a resize with anti-aliasing instead.
 with log_time('Spline resize'):
 out = skimage.transform.resize(
 image=zzr, output_shape=(10, 12), order=1, anti_aliasing=True,
 )
 fig, ax = plt.subplots()
 ax.set_title('Spline resize with anti-aliasing')
 image = ax.imshow(out, aspect='auto', extent=extents)
 fig.colorbar(mappable=image, ax=ax)
 plt.show()
if __name__ == '__main__':
 logging.basicConfig(level=logging.INFO)
 demo()
INFO:interp:Nearest interpolator construction: 0.3400 s
INFO:interp:Nearest interpolation: 0.1031 s
INFO:interp:Intermediate-resolution plot: 0.0096 s
INFO:interp:Spline resize: 0.0384 s

intermediate plot

final plot

answered Jun 9 at 19:47
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.