I'm attempting to run a CNN model with raster files for environmental observations and I'm trying to import a folder of tiff files as the input for the model. I'm relatively new to python and ML and need help with reading the folder of rasters as a 3d numpy array. Below code has been my attempt at reading the folder with gdal:
flix = r'/Desktop/x'
fliy = r'/Desktop/y'
# in_directory = r'C:\Data'
files_to_process = glob(os.path.join(flix, '*.tif')
for data_path in files_to_process:
raster_dataset = gdal.Open(data_path, gdal.GA_ReadOnly)
Note: I somehow am receiving a syntax error at the for loop definition itself as well.
2 Answers 2
As per the comment of @user2856.
flix = r'/Desktop/x'
fliy = r'/Desktop/y'
# in_directory = r'C:\Data'
files_to_process = glob(os.path.join(flix, '*.tif')) # closing bracket for glob()
for data_path in files_to_process:
raster_dataset = gdal.Open(data_path, gdal.GA_ReadOnly)
array = raster_dataset.ReadAsArray() # This will read the file as a numpy array. It will also overwrite the array with each iteration of the loop.
You will also need to read the label (fliy
) and match it to the corresponding image (flix
).
I have three tif files in my folder:
import os
import rasterio
import numpy as np
file_suffix = ".tif"
raster_folder = r"C:\my_folder"
array_list = []
for root, folder, files in os.walk(raster_folder):
for file in files:
if file.endswith(file_suffix):
with rasterio.open(os.path.join(root, file)) as dataset:
array_list.append(dataset.read(1))
arr = np.array(array_list)
#print(arr.shape)
#(3, 10980, 10980)
first_array = arr[0:,:]
#print(first_array.shape)
#(10980, 10980)
)
on thefiles_to_process = glob(os.path.join(flix, '*.tif')
line, hence theSyntaxError
. To read into a numpy array, useraster_dataset.ReadAsArray()
gdal.org/api/python/osgeo.gdal.html