1

I am trying to convert portions of a netCDF (.nc) file into a .csv and I am having some issues when I try and combine the portions I separated out into one matrix. This is the first time I have ever worked with a data file of this type and I feel I may be handling something incorrectly. The portion I am combining first is not all of what I need so it's a bit worrying-I still need to add two variables on after this combination. The file can be found here https://www.ncei.noaa.gov/thredds-ocean/catalog/aoml/tsg/2018/RT_QC_RAW_nc/catalog.html?dataset=aoml/tsg/2018/RT_QC_RAW_nc/WTDO_2018_08_04.nc

The downloaded file is 2178KB and the separated data all have 24905 rows (each has one column before combined) and are each 199456 bytes. I have a total of 4 columns like this-only three are being combined in the as.matrix command. Also when I clear all the non-used data and values and close everything else on the computer it seems I still am unable to free enough. The memory usage report says the objects are using 227 MiB, the session is using 433 MiB and my whole system is using ~10000 MiB and says I have 5975 MiB free memory (this is about what it is when I have everything except R closed). The computer has 16GB of RAM and am running 64 bit.

Here is the code I am using to separate the pieces I need (I found this on a tutorial) I'm using the ncdf4 package and lubridate package to convert the date.

#grab some data
dim_lon <- ncvar_get(nc_ds, "LON")
dim_lat <- ncvar_get(nc_ds, "LAT")
dim_time <- ncvar_get(nc_ds, "Time")
#convert time 
t_units <- ncatt_get(nc_ds, "Time", "units")
t_ustr <- strsplit(t_units$value, " ")
t_dstr <- strsplit(unlist(t_ustr)[3], "-")
date <- ymd(t_dstr) + dseconds(dim_time)
date
#make coordinate matrix
coords <- as.matrix(expand.grid(dim_lon, dim_lat, date))

The making a coordinate matrix gives me this error Error: cannot allocate vector of size 57546.6 Gb

I have run these as well (code and answers) based off of other forum question answers and the info page for the code. I'm very confused about what it means and how to solve this issue-any help would be appreciated.

if(.Platform$OS.type == "windows") withAutoprint({
+ memory.size()
+ memory.size(TRUE)
+ memory.limit(size=500000)
+ })

memory.size() [1] 342.65 memory.size(TRUE) [1] 2541.38 memory.limit(size = 5e+05) [1] 1e+09

I have also tried increasing the memory size based off of forum question answers (the first thing I tried to do) and I believe I may have, but it has not solved my issue and when I repeat that command it seems I may have reached the maximal limit since it gives me this warning.

Warning message: In memory.limit(size = 5e+05) : cannot decrease memory limit: ignored

I think this is a fairly simple thing (combining columns of data) even though the columns are large-I don't quite understand why it needs to much.

I hope that's enough info, but please let me know if more is needed.

Thanks, Megan

Patrick
33k7 gold badges73 silver badges102 bronze badges
asked May 23, 2025 at 21:25
2
  • 9
    AS your data has 24905 rows the expand.grid call with three variables will create an object with 24905^3 rows which takes up alot of memory. You will need to rethink the expand.grid. Commented May 23, 2025 at 22:31
  • Perhaps you need to use expand.grid on sort(unique(dim_lon)), sort(unique(dim_lat)) and sort(unique(date)). That should give you an object with the correct dimensions that should fit into memory. Commented May 24, 2025 at 8:34

1 Answer 1

1

Your data has thermosalinograph data from moving ships, in this case the "Oregon II". These data are trajectories, not grids, so the tutorial you found online is no good.

With packages ncdfCF and CFtime I get the following:

library(ncdfCF)
library(CFtime)
url <- "https://www.ncei.noaa.gov/thredds-ocean/dodsC/aoml/tsg/2018/RT_QC_RAW_nc/WTDO_2018_08_04.nc"
ds <- open_ncdf(url)
# List the data variables in the netCDF file
(vars <- names(ds))
#> [1] "Time" "LAT" "LON" "INT" "SAL" "COND" "EXT" "SST" "A" "B" 
#> [11] "C" "D" "E" "F" "G" "H" "I" "J" "K" "L"
# Let's have a look at the "Time" data variable:
ds[["Time"]]
#> <Variable> Time 
#> Long name: time 
#> 
#> Axes:
#> id name length values 
#> 0 records 10409 [1 ... 10409]
#> 
#> Attributes:
#> id name type length value 
#> 0 units NC_CHAR 33 seconds since 1950年01月01日 00:00:00
#> 1 instrument NC_CHAR 3 GPS 
#> 2 long_name NC_CHAR 4 time

A couple of things to note in the above:

  • Latitude, longitude and time are not axes of a matrix - instead they are columns in a table, each having a length of "records" = 10,409.
  • The "Time" data variable is recorded as an offset in seconds from 1950年01月01日. Use package CFtime to turn that into a POSIXct date-time object. Interestingly, the attribute "instrument" has a value of "GPS", so is this GPS time (18 seconds ahead of UTC currently) or regular (UTC?) time derived from a GPS instrument? That is not clear from the data.

To get all the data into a data.frame, which is easily exported to a CSV file, you need a little more code:

# Loop over the data variables
data <- lapply(vars, function(v) {
 # Get the data variable
 dv <- ds[[v]]
 # The actual data
 values <- dv$data()$raw()
 # Use the "units" attribute to convert any time coordinates from offsets to a `POSIXct`. If that fails, just return the values
 units <- dv$attribute("units")
 if (is.na(units) || inherits(t <- try(CFtime::CFtime(units, "standard", values), silent = TRUE), "try-error"))
 values
 else
 t$as_timestamp(asPOSIX = TRUE)
})
# Convert into a data.frame
data <- as.data.frame(data, col.names = vars)
head(data)
#> Time LAT LON INT SAL COND EXT SST A B C D E F G H I
#> 1 2018年08月22日 00:00:09 NaN NaN 303.407 33.705 NA NaN 304.05 1 1 0 0 0 1 1 1 1
#> 2 2018年08月22日 00:00:39 NaN NaN 303.389 33.704 NA NaN 304.05 1 1 0 0 0 1 1 1 1
#> 3 2018年08月22日 00:01:09 NaN NaN 303.387 33.701 NA NaN 304.05 1 1 0 0 0 1 1 1 1
#> 4 2018年08月22日 00:01:39 NaN NaN 303.375 33.706 NA NaN 304.05 1 1 0 0 0 1 1 1 1
#> 5 2018年08月22日 00:02:09 NaN NaN 303.351 33.712 NA NaN 303.95 1 1 0 0 0 1 1 1 1
#> 6 2018年08月22日 00:02:39 NaN NaN 303.335 33.710 NA NaN 303.95 1 1 0 0 0 1 1 1 1
#> J K L
#> 1 1 0 0
#> 2 1 0 0
#> 3 1 0 0
#> 4 1 0 0
#> 5 1 0 0
#> 6 1 0 0
answered May 25, 2025 at 8:06
Sign up to request clarification or add additional context in comments.

3 Comments

Out of interest, are all LAT, LON all NaN?
In this particular file, yes. That's a bit odd, obviously. Maybe the ship was stationary during these four days of observation?
Thank you-using a data frame worked well! I think the matrix in the tutorial I was using was because sometimes (often?) these files are 3-d and require more than just adding a few columns. Also, yes, some of the data are missing in these files-I am not the one who can answer as to why.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.