10

I am working with a large (30GB) 3 band geotiff. I am using the following code to attempt to import the bands into python

import gdal
import numpy as np
raster_path = "C:/Path/To/Image.tif"
raster_dataset = gdal.OpenEx(raster_path, gdal.GA_ReadOnly)
geo_transform = raster_dataset.GetGeoTransform()
proj = raster_dataset.GetProjectionRef()
bands_data = []
for b in range(1, raster_dataset.RasterCount+1):
 band = raster_dataset.GetRasterBand(b)
 bands_data.append(band.ReadAsArray())

However, when I try to iterate over the bands, all my memory gets used and it won't load for hours.

Any suggestions regarding other ways to handle large spatial data in python? I have used ArcGIS and R in the past, but am relatively new to python. I would prefer to work from command line, if possible.

Antonio Falciano
14.5k2 gold badges38 silver badges68 bronze badges
asked Jan 18, 2018 at 14:34
0

2 Answers 2

9

When you ReadAsArray(), you're creating a numpy array, which essentially means you're loading it to memory. For a 30GB TIF, expect something around that ballpark. Meaning, you'll only even be able to work with it if you have a 64GB+ workstation or server. And even then, it's not desirable.

For larger rasters, consider working with blocks. Here is a good tutorial on this (starting from page 24). Notice, however, that this document was writen for an older version of GDAL. Since version 2.0, getting the custom block size of a raster is done with GDALRasterBand::GetBlockSize() method.

Also take in mind that a raster is most efficiently iterated over its default block size (each raster format will have its own). However, the block type of a TIFF can be tile or strip. Tile means a rectangular block, and TIFFs created with GDAL default to that (a 256x256 tile, specifically), but TIFFs with strip type have blocks that span the entire width of the raster, and may even be only one row high. This not only difficults certain processes (such as kernel), but may also be too memory-intensive. Check your block size beforehand, and, if a strip, choose a personalized block size.

answered Jan 18, 2018 at 16:30
4
  • 1
    This explanation and the tutorial provided in the link provide a clear methodology for dealing with larger rasters in python. Commented Jan 18, 2018 at 20:32
  • Here is a link to another question that provides code that will assist in implementing what @Roberto Ribeiro suggests: gis.stackexchange.com/questions/172666/… Commented Jan 18, 2018 at 20:56
  • I am still confused as to how to work with a large TIFF using numpy. For example, if I want to perform an unsupervised classification, I can't really do it with blocks or chunks of the TIFF. How do I work with the whole TIFF? Commented Jan 18, 2018 at 21:40
  • @user44796 You work just like that. The idea of a "whole TIFF" is arbitrary - that is, the Earth is continuous, any division in scenes is for convenience's sake. Imagine you have 4 Pleiades scenes of W x H dimension each. That same area with SPOT 6 would be only 1 W x H scene. So making a process in the entire SPOT raster or splitting it into 4 chunks is the same, given that if you were working with Pleiades you'd be working on 4 rasters anyway. Commented Jan 18, 2018 at 22:09
7

A great feature of raster data is that it often allows block-wise processing. You can "break" the raster up into rectangular windows to reduce the memory footprint of your process, or to process blocks in parallel and get results faster.

The documentation for GDAL's Python bindings is thin and examples of getting raster windows using ReadAsArray() are scarce, but I found this in the GDAL tests: https://github.com/OSGeo/gdal/blob/77544764f51420e42468641fc3d5a087f8ea6d8f/autotest/gcore/numpy_rw.py#L115. Pass in some offsets and width and height of your window and you have a band subset as a numpy array.

To apply this in your program, loop over blocks of the raster, and within that loop over the bands of the raster.

answered Jan 18, 2018 at 15:55
2

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.