I have a NetCDF database (link to file) obtained from Copernicus Climate Data of ~43k unevenly spaced values around the world. Instead of being indexed using (lat
, lon
) it uses a sequence of 'stations'.
My understanding is that station_x_coordinate
and station_y_coordinate
are not treated as dimension coordinates by xarrays (according to xarrays docs).
I have a list of my own locations for which I want to get the closest value (i.e., my locations do not necessarily match one of the data points in the NetCDF database).
I would like to use the many selection and interpolation methods of xarray (e.g. xarray.Dataset.sel
to get values in locations unmatched to data points). But I get errors or problems which are, I guess, related to the fact that I don't have real dimension coordinates but variables.
For example, I can use xarray.Dataset.sel(stations = 11.5, method='nearest')
which will bring the closest value (in this case, station = 12) but this is very unintuitive, as I have no idea how the stations are ordered in the grid. I would rather search using the more sensible indexing (lat, lon) on my target locations. To be clear, I would like to use it like xarray.Dataset.sel(x = 0, y = 0, method='nearest')
to find the closest value at (lat
= 0, lon
= 0) location.
Until now I was able to get all the data points of the NetCDF file into a regular pandas dataframe (columns = ['lat', 'lon', 'value']) and save it as CSV. I can try to make my own function to find the nearest neighbors in 2D space given a target location (it must exist for sure in another library, maybe even numpy or GeoPandas)... but as I said I would like to use the available methods of xarray
to get values at 'unmatched' locations, interpolate, etc.
What would you do? For example, can I generate a new NetCDF file with a structure that sets station_x_coordinate
and station_y_coordinate
of my current NetCDF file as real dimension coordinates in the new file? I am no expert in netCDF files, but I assume this way I could use the methods mentioned above. Makes sense?
This is the structure of the NetCDF file (output of xarray.Dataset
):
<xarray.Dataset>
Dimensions: (stations: 43119)
Coordinates:
* stations (stations) uint16 0 1 2 3 ... 43731 43732 43733
station_x_coordinate (stations) float64 ...
station_y_coordinate (stations) float64 ...
Data variables:
return_mean_surge_level (stations) float64 ...
Attributes: (12/34)
Conventions: CF-1.6
featureType: timeSeries
id: GTSMv3_extreme_value_analysis
naming_authority: https://deltares.nl/en
Metadata_Conventions: Unidata Dataset Discovery v1.0
title: relative change in return values for surge...
... ...
geospatial_vertical_max: 18.564
geospatial_vertical_units: m
geospatial_vertical_positive: up
time_coverage_start: 1985
time_coverage_end: 2050
experiment: highres-future
1 Answer 1
You could do something like below. First create some test data:
import xarray as xr
import numpy as np
ds = xr.Dataset({"data": (["stations"], [1.2, 23.7, 77.8])}, coords = {"stations": ("stations", ["la", "lo", "li"]), "lat": ("stations", [45.6, 34.1, 78.2]), "lon": ("stations", [-49.1, 2.1, 179.1])})
lat = 36.3
lon = 3.8
Then calculate the distances from your stations to your point of interest and find the station where the distance is smallest:
selector = np.sqrt((ds["lat"] - lat)**2 + (ds["lon"] - lon)**2).idxmin()
Then use that to filter your dataset:
ds.sel(stations = selector)["data"]
Very similar, you could also find the index of the station for which the distance is smallest and use that to index your dataset like this:
selector = np.sqrt((ds["lat"] - lat)**2 + (ds["lon"] - lon)**2).argmin()
ds.isel(stations = selector)["data"]
Explore related questions
See similar questions with these tags.