1

I have read other solutions for NetCDF data but my data are a little different and I do not know how to extract data from NetCDF and save them in CSV files based on stations. Data include the maximum temperature values for stations. I just need stations located in latitudes:25.74 to 49.05 and longitude: -93.44 to -116.0. The format of time is different and I just need time[7518:43947190] which include data from 1948. I want to create multiple csv files. every file have to be data for one single station that include time, tmax, and quality flag of data.

from netCDF4 import Dataset
dataset=Dataset("D:/ushcn_tmax.nc")
#### Print dimentions #####
print dataset.file_format
print dataset.dimensions.keys()
print dataset.dimensions['name_strlen']
print dataset.dimensions['obs']
print dataset.dimensions['station']
#### Print variables ####
print dataset.variables.keys()
print dataset.variables['LON']
print dataset.variables['LAT']
print dataset.variables['ELEVATION']
print dataset.variables['STATION_NAME']
print dataset.variables['STATION_INDEX']
print dataset.variables['TIME']
print dataset.variables['TMAX']
print dataset.variables['TMAX_MFLAG']
print dataset.variables['TMAX_QFLAG']
print dataset.variables['TMAX_SFLAG']

Dimension and variables of my data can bee seen here:

NETCDF3_CLASSIC
[u'name_strlen', u'obs', u'station']
<type 'netCDF4._netCDF4.Dimension'>: name = 'name_strlen', size = 50
<type 'netCDF4._netCDF4.Dimension'>: name = 'obs', size = 43947189
<type 'netCDF4._netCDF4.Dimension'>: name = 'station', size = 1218
[u'LON', u'LAT', u'ELEVATION', u'STATION_NAME', u'STATION_INDEX', u'TIME', u'TMAX', u'TMAX_MFLAG', u'TMAX_QFLAG', u'TMAX_SFLAG']
<type 'netCDF4._netCDF4.Variable'>
float32 LON(station)
 standard_name: longitude
 long_name: station longitude
 units: degrees_east
unlimited dimensions: 
current shape = (1218,)
filling off
<type 'netCDF4._netCDF4.Variable'>
float32 LAT(station)
 standard_name: latitude
 long_name: station latitude
 units: degrees_north
unlimited dimensions: 
current shape = (1218,)
filling off
<type 'netCDF4._netCDF4.Variable'>
float64 ELEVATION(station)
 long_name: elevation above the sea level
 standard_name: elevation
 units: m
 positive: up
 axis: Z
unlimited dimensions: 
current shape = (1218,)
filling off
<type 'netCDF4._netCDF4.Variable'>
|S1 STATION_NAME(station, name_strlen)
 long_name: USHCN station name
 cf_role: timeseries_id
unlimited dimensions: 
current shape = (1218, 50)
filling off
<type 'netCDF4._netCDF4.Variable'>
int32 STATION_INDEX(obs)
 long_name: which station this obs is for
 instance_dimension: station
unlimited dimensions: 
current shape = (43947189,)
filling off
<type 'netCDF4._netCDF4.Variable'>
float64 TIME(obs)
 standard_name: time
 long_name: Time
 units: decimal day
 _FillValue: -9999.0
 comment: time calculeted as: year + day_of_year/days_in_year
unlimited dimensions: 
current shape = (43947189,)
filling off
<type 'netCDF4._netCDF4.Variable'>
int32 TMAX(obs)
 standard_name: TMAX
 long_name: maximum temperature
 units: degrees F
 coordinates: time lat lon elevation
 _FillValue: -9999
unlimited dimensions: 
current shape = (43947189,)
filling off
<type 'netCDF4._netCDF4.Variable'>
|S1 TMAX_MFLAG(obs)
 standard_name: TMAX_MFLAG
 long_mane: measurement flag for TMAX
 flag_values: BDLT
 flag_meanings: Blank = no measurement information applicable; B = precipitation total formed from two 12-hour totals; D = precipitation total formed from four six-hour totals; L = temperature appears to be lagged with respect to reported hour of OBServation; T = trace of precipitation, snowfall, or snow depth
unlimited dimensions: 
current shape = (43947189,)
filling off
<type 'netCDF4._netCDF4.Variable'>
|S1 TMAX_QFLAG(obs)
 standard_name: TMAX_QFLAG
 long_mane: quality flag for TMAX
 flag_values: ADGIKMNORSTWX
 flag_meanings: Blank = did not fail any quality assurance check; A = failed accumulation total check; D = failed duplicate check; G = failed gap check; I = failed internal consistency check; K = failed streak/frequent-value check; M = failed megaconsistency check; N = failed naught check; O = failed climatological outlier check; R = failed lagged range check; S = failed spatial consistency check; T = failed temporal consistency check; W = temperature too warm for snow; X = failed bounds check;
unlimited dimensions: 
current shape = (43947189,)
filling off
<type 'netCDF4._netCDF4.Variable'>
|S1 TMAX_SFLAG(obs)
 standard_name: TMAX_SFLAG
 long_mane: source flag for TMAX
 flag_values: 0126ABFGHIMQRSX
 flag_meanings: Blank = No source (i.e., data value missing); 0 = U.S. Cooperative Summary of the Day (NCDC DSI-3200); 1 = U.S. Preliminary Cooperative Summary of the Day -- Transmitted; 2 = U.S. Preliminary Cooperative Summary of the Day -- Keyed from paper forms; 6 = CDMP Cooperative Summary of the Day (NCDC DSI-3206); A = U.S. Automated Surface Observing System (ASOS) real-time data (since January 1, 2006); B = U.S. ASOS data for October 2000-December 2005 (NCDC DSI-3211); F = U.S. Fort data; G = Official Global Climate Observing System (GCOS) or other government-supplied data; H = High Plains Regional Climate Center real-time data; I = International collection (non U.S. data received through personal contacts); M = Monthly METAR Extract (additional ASOS data); Q = Data from several African countries that had been 'quarantined', that is, withheld from public release until permission was granted from the respective meteorological services; R = NCDC Reference Network Database (Climate Reference Network and Historical Climatology Network-Modernized); S = Global Summary of the Day (NCDC DSI-9618), NOTE: 'S' values are derived from hourly synoptic reports exchanged on the Global Telecommunications System (GTS).Daily values derived in this fashion may differ significantly from 'true' daily data, particularly for precipitation (i.e., use with caution); X = U.S. First-Order Summary of the Day (NCDC DSI-3210)
unlimited dimensions: 
current shape = (43947189,)
filling off

I have tried to read data with :

xr.open_dataset("D:/ushcn_tmax.nc")
df=dataset.sel(lon=-99.30,lat=32.73,method='nearest')

while the mentione lat and lon belong to one station and I received error "KeyError: 'lat'". Is there any way I can convert variables (lat, lon, and time) to dimentions to make it easier to work with? Or any way I can extract data based on station as dimension?

Vince
20.5k16 gold badges49 silver badges65 bronze badges
asked Feb 21, 2019 at 20:37
1
  • Hello, can you update the question with the result of dataset = xr.open_dataset("D:/ushcn_tmax.nc") >>>dataset ? Commented Mar 2, 2019 at 1:05

1 Answer 1

3

Note, I can update the answer with the correct variables if you include in the question the variables as read by xarray (as I asked you in the comment).

You are using two different packages when xarray alone would do the work just fine (xarray has a netCDF4 backend, but wraps to it more human accessible, understandable and readable methods.. and much more!).

import xarray as xr
dataset = xr.open_dataset(r"C:\path_to_ds\dataset.nc")

Inspect the variables

Example with one file I had on pc:

dataset
>>> <xarray.Dataset>
>>> Dimensions: (time: 63, x: 4000, y: 4000)
>>> Coordinates:
>>> * y (y) float64 -4.2e+06 -4.2e+06 -4.2e+06 ... -4.3e+06 -4.3e+06
>>> * x (x) float64 1.5e+06 1.5e+06 1.5e+06 ... 1.6e+06 1.6e+06 1.6e+06
>>> * time (time) datetime64[ns] 2013年06月21日T23:53:00 ... 2019年01月13日T23:50:56
>>> Data variables:
>>> relative_humidity (time, y, x) float32 ...#show descriptive data about the whole dataset

Select location

Use the xarray.Dataset.sel() method with the name of the coordinates that you found above. With my example, I will go with "x" and "y":

subset = dataset.sel(x=1.504e+06, y=-4.202e+06, method='nearest')

Note, you are assuming that the coordinates are in decimal degrees but you should check the projection attribute to be sure you are querying in the correct reference system Depending on how well documented is the metadata of the file, you can access the attribute of the coordinate variable by:

dataset.y
>>> <xarray.DataArray 'y' (y: 4000)>
>>> array([-4200012.5, -4200037.5, -4200062.5, ..., -4299937.5, -4299962.5,
 -4299987.5])
>>> Coordinates:
>>> * y (y) float64 -4.2e+06 -4.2e+06 -4.2e+06 ... -4.3e+06 -4.3e+06
>>> Attributes:
>>> units: metre
>>> standard_name: projection_y_coordinate
>>> long_name: y coordinate of projection 

Or you might need to see the global attribute of the file, that might contain a projection field by

dataset.attrs

Select the time

As for the time dimension, xarray converts automatically the difficult to read "seconds past" format of netCDF4 in a more human readable np.datetime64. However, I prefer to select dates with the help of pandas.to_datetime() method, that easily converts strings to datetime.datetime (which is even more human readable than np.datetime). I am sure there are similar methods in xarray, or even the exact same since xarray inherits many functions from pandas and numpy, but I can never be bothered looking for them.

import pandas.to_datetime as to_datetime
all_dates = dataset.time.values
sel_dates = [date for date in all_dates if to_datetime(date) >= to_datetime("01/01/1948")] #select dates past 1st Jan 1948
result = subset.sel(time=sel_dates)

all done! just select the variable you are querying, in my case relative_humidity

result.relative_humidity.values 
>>> array([0.1,...,0.2], dtype=float32) 

It is a np.ndarray of shape=(n_selected_dates, n_selected pixels); in this case 1 pixel as we passed only one value for x, y.

answered Mar 2, 2019 at 2:03

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.