Extract nearest values from NetCDF file in Python

Question 1

I have a NetCDF database (link to file) obtained from Copernicus Climate Data of ~43k unevenly spaced values around the world. Instead of being indexed using (lat, lon) it uses a sequence of 'stations'.
My understanding is that station_x_coordinate and station_y_coordinate are not treated as dimension coordinates by xarrays (according to xarrays docs).

I have a list of my own locations for which I want to get the closest value (i.e., my locations do not necessarily match one of the data points in the NetCDF database).

I would like to use the many selection and interpolation methods of xarray (e.g. xarray.Dataset.sel to get values in locations unmatched to data points). But I get errors or problems which are, I guess, related to the fact that I don't have real dimension coordinates but variables.
For example, I can use xarray.Dataset.sel(stations = 11.5, method='nearest') which will bring the closest value (in this case, station = 12) but this is very unintuitive, as I have no idea how the stations are ordered in the grid. I would rather search using the more sensible indexing (lat, lon) on my target locations. To be clear, I would like to use it like xarray.Dataset.sel(x = 0, y = 0, method='nearest') to find the closest value at (lat = 0, lon = 0) location.

Until now I was able to get all the data points of the NetCDF file into a regular pandas dataframe (columns = ['lat', 'lon', 'value']) and save it as CSV. I can try to make my own function to find the nearest neighbors in 2D space given a target location (it must exist for sure in another library, maybe even numpy or GeoPandas)... but as I said I would like to use the available methods of xarray to get values at 'unmatched' locations, interpolate, etc.

What would you do? For example, can I generate a new NetCDF file with a structure that sets station_x_coordinate and station_y_coordinate of my current NetCDF file as real dimension coordinates in the new file? I am no expert in netCDF files, but I assume this way I could use the methods mentioned above. Makes sense?

This is the structure of the NetCDF file (output of xarray.Dataset):

<xarray.Dataset>
Dimensions: (stations: 43119)
Coordinates:
 * stations (stations) uint16 0 1 2 3 ... 43731 43732 43733
 station_x_coordinate (stations) float64 ...
 station_y_coordinate (stations) float64 ...
Data variables:
 return_mean_surge_level (stations) float64 ...
Attributes: (12/34)
 Conventions: CF-1.6
 featureType: timeSeries
 id: GTSMv3_extreme_value_analysis
 naming_authority: https://deltares.nl/en
 Metadata_Conventions: Unidata Dataset Discovery v1.0
 title: relative change in return values for surge...
 ... ...
 geospatial_vertical_max: 18.564
 geospatial_vertical_units: m
 geospatial_vertical_positive: up
 time_coverage_start: 1985
 time_coverage_end: 2050
 experiment: highres-future

Question 2

You could do something like below. First create some test data:

import xarray as xr
import numpy as np
ds = xr.Dataset({"data": (["stations"], [1.2, 23.7, 77.8])}, coords = {"stations": ("stations", ["la", "lo", "li"]), "lat": ("stations", [45.6, 34.1, 78.2]), "lon": ("stations", [-49.1, 2.1, 179.1])})
lat = 36.3
lon = 3.8

Then calculate the distances from your stations to your point of interest and find the station where the distance is smallest:

selector = np.sqrt((ds["lat"] - lat)**2 + (ds["lon"] - lon)**2).idxmin()

Then use that to filter your dataset:

ds.sel(stations = selector)["data"]

Very similar, you could also find the index of the station for which the distance is smallest and use that to index your dataset like this:

selector = np.sqrt((ds["lat"] - lat)**2 + (ds["lon"] - lon)**2).argmin()
ds.isel(stations = selector)["data"]

Bert Coerver Bert Coerver 2,02911 silver badges23 bronze badges · Accepted Answer · 2023-11-14 08:08:31Z

You could do something like below. First create some test data:

import xarray as xr
import numpy as np
ds = xr.Dataset({"data": (["stations"], [1.2, 23.7, 77.8])}, coords = {"stations": ("stations", ["la", "lo", "li"]), "lat": ("stations", [45.6, 34.1, 78.2]), "lon": ("stations", [-49.1, 2.1, 179.1])})
lat = 36.3
lon = 3.8

Then calculate the distances from your stations to your point of interest and find the station where the distance is smallest:

selector = np.sqrt((ds["lat"] - lat)**2 + (ds["lon"] - lon)**2).idxmin()

Then use that to filter your dataset:

ds.sel(stations = selector)["data"]

Very similar, you could also find the index of the station for which the distance is smallest and use that to index your dataset like this:

selector = np.sqrt((ds["lat"] - lat)**2 + (ds["lon"] - lon)**2).argmin()
ds.isel(stations = selector)["data"]

Stack Exchange Network

Extract nearest values from NetCDF file in Python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Extract nearest values from NetCDF file in Python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions