Split a GeoTIFF (BigTIFF) file into subsets with a specific overlap in Python

Question 1

I have a BigTIFF file that I need to split into tiles with a set tile size and overlap. I have a script for this using PIL:

tile_height = tile_width = 1000
overlap = 80
stride = tile_height - overlap
start_num=0
def crop(infile, tile_height, tile_width, stride, img_dict, prj_name):
 im = Image.open(infile) 
 img_width, img_height = im.size
 print(im.size)
 print(img_width * img_height / (tile_height - stride) / (tile_width - stride))
 count = 0
 for r in range(0, img_height-tile_height+1, stride):
 for c in range(0, img_width-tile_width+1, stride):
 #tile = im[r:r+100, c:c+100]
 box = (c, r, c+tile_width, r+tile_height)
 top_pixel = [c,r]
 img_dict[prj_name + "---" + str(count) + ".png"] = top_pixel
 count += 1
 yield im.crop(box)
img = Image
img_dict = {}
# create the dir if it doesn't already exist
if not os.path.exists(img_dir):
 os.makedirs(img_dir)
# break it up into crops
for k, piece in enumerate(crop(infile, tile_height, tile_width, stride, img_dict, prj_name), start_num):
 img=Image.new('RGB', (tile_height, tile_width), (255, 255, 255))
 print(img.size)
 print(piece.size)
 img.paste(piece)
 image_name = prj_name + "---%s.png" % k
 path=os.path.join(img_dir, image_name)
 img.save(path)
#add a json file with all image names and geospatial metadata 
full_dict = {"image_name" : infile,
 "image_locations" : img_dict,
 "crs" : str(dataset.crs)
 }
with open(img_dir + '/data.json', 'w') as fp:
 json.dump(full_dict, fp)

I can't use PIL on my other rasters, as they are "BigTiff" files and not supported in PIL. I am looking for a way to translate this script into another module keeping these exact parameters. I need the parameters and naming methods to stay exactly the same as I'm using these tiles for a deep learning model that I have already created.

I have never used something like GDAL before, but I've read that this may be my best bet for Big TIFF tiling? I would really like to find a way to do this in Python.

Question 2

rasterio should pretty much drop into your existing script since it'll return a numpy array. It wraps gdal but is much more convenient to work with than the Python bindings. It normally comes with BigTIFF support but depends on how it was built, I believe. If you grab it via conda/conda-forge it should, at least.

Question 3

You can do a for loop and read one tile at a time using gdal.ReadAsArray(), passing both an offset and a window size as arguments. This function returns a numpy array which you can then easily export to a JPG file.

Your code could look something:

from osgeo import gdal
# open TIFF file (reading) mode and get dimensions
ds = gdal.Open(r'C:\path\to\your\raster.tif', 0)
width = ds.RasterXSize
height = ds.RasterYSize
# define tile size and number of pixels to move in each direction
tile_size_x = 256
tile_size_y = 256
stride_x = 128
stride_y = 128
for x_off in range(0, width, stride_y):
 for y_off in range(0, height, stride_x):
 # read tile
 arr = ds.ReadAsArray(x_off, y_off, tile_size_x, tile_size_y)
 # export image using either PIL, gdal or some other library

Of course, you'll need to deal with the edge cases when there are not enough pixels left in the x or y axis.

Question 4

CLI can also be used as below. Make suitable changes to use function parameters for tile and overlap sizes.

def generate_tiles(input_geotiff):
 targetDir=ntpath.basename(input_geotiff).split('.')[0]+"_tiles"
 Path(targetDir).mkdir(parents=True, exist_ok=True)
 command = "gdal_retile.py -ps 512 512 -overlap 128 -targetDir "+ targetDir + " " + input_geotiff
 print(os.popen(command).read())
 return targetDir

Marcelo Villa Marcelo Villa 6,0982 gold badges20 silver badges39 bronze badges · Accepted Answer · 2020-04-05 01:25:26Z

You can do a for loop and read one tile at a time using gdal.ReadAsArray(), passing both an offset and a window size as arguments. This function returns a numpy array which you can then easily export to a JPG file.

Your code could look something:

from osgeo import gdal
# open TIFF file (reading) mode and get dimensions
ds = gdal.Open(r'C:\path\to\your\raster.tif', 0)
width = ds.RasterXSize
height = ds.RasterYSize
# define tile size and number of pixels to move in each direction
tile_size_x = 256
tile_size_y = 256
stride_x = 128
stride_y = 128
for x_off in range(0, width, stride_y):
 for y_off in range(0, height, stride_x):
 # read tile
 arr = ds.ReadAsArray(x_off, y_off, tile_size_x, tile_size_y)
 # export image using either PIL, gdal or some other library

Of course, you'll need to deal with the edge cases when there are not enough pixels left in the x or y axis.

Stack Exchange Network

Split a GeoTIFF (BigTIFF) file into subsets with a specific overlap in Python

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Split a GeoTIFF (BigTIFF) file into subsets with a specific overlap in Python

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions