5

I am creating IDF Curves for work using Python and 30-minute rainfall rate rasters. There are 24 rasters per day and about two decades worth of data. I basically input lat/lon coordinates and my python script grabs the pixel value from each raster throughout the period and stores it in a DataFrame to use for statistical analysis. This process currently takes about 5 hours to run per lat/lon coordinate with the majority of the time coming from opening the raster and extracting the pixel value.

Are there any ways to optimize this process for speed?

I am completely stuck and can't find if there is any speed efficiencies native to GDAL. I've submitted the function I use for extracting the pixel value.

def extract_point_values_from_raster(raster_filepath, lat_lon_str):
 lat_lon = lat_lon_str.split(',')
 lat = float(lat_lon[0])
 lon = float(lat_lon[1])
 try:
 src_ds = gdal.Open(raster_filepath)
 gt = src_ds.GetGeoTransform()
 rb = src_ds.GetRasterBand(1)
 if (lat < gt[3] and lat > (gt[4] - gt[3]) and 
 lon > gt[0] and lon < (gt[2] - gt[0])):
 
 px = int((lon-gt[0])/gt[1])
 py = int((lat-gt[3])/gt[5])
 intval = rb.ReadAsArray(px,py,1,1)
 if intval == None:
 raster_val = None
 else:
 raster_val = round(intval[0][0],2)
 del src_ds
 except:
 raster_val = None
 return(raster_val)
 
PolyGeo
65.5k29 gold badges115 silver badges349 bronze badges
asked Mar 25, 2020 at 18:56
7
  • 1
    Are you doing this for multiple coordinates? One approach would be sampling all the coordinates for the raster you are opening at the time. I can post a numpy solution that might be helpful if that is the case. Commented Mar 25, 2020 at 21:11
  • thank you @MarceloVilla, but currently i am only analyzing one coordinate at a time! Commented Mar 25, 2020 at 21:36
  • You could look into: gis.stackexchange.com/questions/358036/…, but I am not sure that it would be faster or not. Commented Apr 25, 2020 at 4:17
  • Did you flush the cache? Might free up memory. Commented Apr 29, 2021 at 7:10
  • 1
    What dimesions are the rasters? Commented May 15 at 5:39

1 Answer 1

1

Some of my initial questions/thoughts/things to experiment with:

  • Every time you call the function it's re-opening the raster, then performing an intersection check. Presumably there may be multiple lat/lon queries against a given raster

    • Can you pre-index all of the the raster bounds (e.g. with an rtree), then query all of a raster's intersecting points at once? This index could be built once and then saved for later queries.
    • Or, can you lazily open a raster only when needed, but then keep it open in case it's needed again later?
  • Since it seems to be IO bound, make sure the rasters are located on the fastest storage you have access to, ideally local solid state drives

  • The raster format can have an impact on performance. For example, if a raster is interleaved by line, I believe that GDAL has to read the entire line to retreive your pixel of interest. If you're repeatedly reading this dataset, consider trying

    • Converting to an internally-tiled format, if not already
    • Stacking so that you have one 24-band raster per day and experimenting to find the best interleave (BIP perhaps?)
  • Have you profiled the code to ensure that it's slowest where you think it is, and that there are no other significant bottlenecks?

answered Mar 25, 2020 at 19:24
2
  • 2
    Thanks for the reply! So I'm attempting to read through about 350,000 different raster files and getting the value at the same lat/lon for each file. I don't use the same raster twice. This also is where the problem comes in with being unable to store the rasters on my fastest storage as I just don't have the space on that drive, so I have them saved to a synology. I have profiled the code, though, and this is the spot taking the most time! Commented Mar 25, 2020 at 19:34
  • 3
    I see. For any given point query will there be valid data in every one of those 350,000 rasters, or only a subset of them? In other words, does every file cover the exact same area? Commented Mar 26, 2020 at 18:35

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.