2

I work with large drone image raster files, on the order of 40,000 x 40,000 and above. I have a large uncompressed GeoTIFF file and I want to use rasterio to rewrite the file in compressed format. I can do this by loading all data into memory, but is there a way to execute this writing without loading everything to memory?

For my original file I can open it with:

 dat = rasterio.open("grid_001.tif")

Then to rewrite the file with compression I tried:

profile = dat.profile.copy()
profile.update(
 compress='lzw')
with rasterio.open("grid_001_compressed.tif", 'w', **profile) as dst:
 dst.write(dat)

This will give me an error:

 ValueError: Source shape (44134, 44162) is inconsistent with given indexes 3

This is an expected error, because when I open the dataset it creates an iterator or lazy object without actually accessing the data. Now, if I did a command like:

 dat = dat.read()

This will load all of the data from the file into memory, and I can an array of dimensions [3, 44134, 44162] that I can write. BUT, this takes a lot of memory to implement.

Hence, is there a way to perform the same operation, but without loading everything into memory? I am not sure if windowed reads would help in this case or anything.

Vince
20.5k16 gold badges49 silver badges65 bronze badges
asked Jul 27, 2020 at 21:32
3
  • 2
    Just a little bit off topic but I would use GDAL_Translate -of GTIFF -co COMPRESS=LZW grid_001.tif grid_001_compressed.tif with suprocess.Popen and subsequent wait(), if all you're doing is rewriting the entire raster with compression you can skip the overheads of converting objects to/from python and no longer be concerned with memory management as GDAL_Translate manages all of that optimally. Commented Jul 28, 2020 at 7:04
  • @MichaelStimson thanks for the suggestion. Yeah, that make sense, using gdal_translate would work. I was looking for something that was more python and less command line, but your suggestion would work. Commented Jul 28, 2020 at 14:21
  • 1
    If you want to stick to python, there's always gdal.Translate('output.tif', openDataset, creationOptions = ['COMPRESS=LZW']) Commented Jul 28, 2020 at 21:25

1 Answer 1

10

You can use rasterio's window or block reading & writing

dat = rasterio.open("grid_001.tif")
profile = dat.profile.copy()
profile.update(compress='lzw')
with rasterio.open("grid_001_compressed.tif", 'w', **profile) as dst:
 for ji, window in dat.block_windows(1)
 dst.write(dat.read(window=window), window=window)
answered Jul 27, 2020 at 21:50
2
  • this is great. Thanks for the direction. Will this work for multiband rasters too? I just ask because it has dat.block_windows(1). My raster is bands, so would I need to iterate over bands, or could I just use dat.block_windows(3)? Commented Jul 27, 2020 at 21:53
  • 2
    Assuming each band has the same block size (as per the documentation I linked to), then this will read & write all bands at the same time Commented Jul 27, 2020 at 21:56

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.