I am working with location data for 22 different individuals (id). The data contains an id column, and coordinates (UTM_x and UTM_y). For all individuals combined, there is a total of 991,099 different locations. I am trying to extract raster values (1x1m vegetation classification) for each point location and I am having issues with the speed of the extraction (it takes an extremely long time) and memory issues. Here is what I have done so far;
First I create a spatial points dataframe from the UTM coords and project them to the correct CRS;
coordinates(data) <- c("UTM_lon", "UTM_lat") data@proj4string <- CRS("+proj=utm +zone=11 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0")
Then I load the raster file
veg.r <- raster("C:/Users/veg.ras.tif")
The raster file was projected in Arc. Checking to make sure the projections are the same;
proj4string(data) == proj4string(veg.r) 1 TRUE
Here are the details of the raster;
veg.r class : RasterLayer dimensions : 81299, 87251, 7093419049 (nrow, ncol, ncell) resolution : 1, 1 (x, y) extent : 606777.517, 694028.517, 4751626.24, 4832925.24 (xmin, xmax, ymin, ymax) coord. ref. : +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 data source : C:\Users\veg.ras.tif names : veg.ras values : 1, 11 (min, max)
Now extract the raster cell values for each point;
ext <- extract(veg.r, data, df=TRUE)
I have waited 24+ hours for the extraction with no results. I know there isn't an issue with the actual code, because I can perform this function with smaller subsets of the data.
I have tried using a multicore approach, as suggested HERE, with the code below;
library(snowfall) data.sp <- SpatialPoints(data, proj4string = CRS("+proj=utm +zone=11 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0"))
Now, create a R cluster using all the machine cores minus one
sfInit(parallel=TRUE, cpus=parallel:::detectCores()-1) sfLibrary(raster) Library raster loaded. Library raster loaded in cluster.
sfLibrary(sp) Library sp loaded. Library sp loaded in cluster.
data.df <- sfSapply(veg.r, extract, y=data.sp) Error: cannot allocate vector of size 26.4 Gb sfStop()
Stopping cluster
As you can see, I get an error due to memory issues.
Are there any suggestions on why the "multicore approach" is not working?
1 Answer 1
What happens if you run data.df <- sfSapply(list(veg.r), extract, y=data.sp)
? The question you link to is set up to extract data from a list of rasters, not just one.
Otherwise, how's performance if you rasterize your points layer and use it to extract values? Pseudocode:
presabs <- rasterize(points, vegraster, field = 'ID')
presabs[!is.na(values(presabs))] <- 0
xtrct <- presabs + vegraster
vals <- rasterToPoints(xtrct)
Then you could use this spatial join method to append the extracted data back to your original points. Suspect it might not be foolproof (multiple points in the same cell might cause issues) but maybe worth a shot.
-
Thanks. I subset my data to contain only 222,156 observations and clipped my raster in Arcmap in attempts of speeding up processing time, but had no success with both options. Not sure what to try next.j.wes519– j.wes5192019年12月27日 19:20:42 +00:00Commented Dec 27, 2019 at 19:20
velox
package is no longer supported by CRAN, and its functions have been moved to other packages.prioritizr
has a functionfast_extract
, which worked for me extracting ~1 million points from a ~2 GB raster (took ~ 9 minutes and didn't slow down my computer, like raster::extract did).