I am relatively new to coding and am working on a script to average 24 hourly GRIB2 files to daily files and output to a TIF. I am currently doing this in Google Colab by calling a GDAL python script called gdal_calc.py (downloadable from GDAL), which I have uploaded to my drive.
My code looks like this (for November 29):
import os
! python /content/drive/MyDrive/NOAA/grib2_files_test/gdal_calc.py \
-A /content/drive/MyDrive/NOAA/grib2_files/202011/2020112900/postprd/nearsfc_smoke_00.grib2 \
-B /content/drive/MyDrive/NOAA/grib2_files/202011/2020112900/postprd/nearsfc_smoke_01.grib2 \
-C /content/drive/MyDrive/NOAA/grib2_files/202011/2020112900/postprd/nearsfc_smoke_02.grib2 \
etc (D-V)
-W /content/drive/MyDrive/NOAA/grib2_files/202011/2020112900/postprd/nearsfc_smoke_22.grib2 \
-X /content/drive/MyDrive/NOAA/grib2_files/202011/2020112900/postprd/nearsfc_smoke_23.grib2 \
--outfile=/content/drive/MyDrive/NOAA/grib2_files/averaged2/20201129.tiff \
--calc="(A+B+C+D+E+F+G+H+I+J+K+L+M+N+O+P+Q+R+S+T+U+V+W+X)/24"
I have data for all of 2020, but there are missing days so I haven't tried to make a monthly loop. I would however love to create a "dd" variable that I could change for each day instead of manually changing the day value in each of the file names and output name. I think my attempts to do this have not worked because I am calling an external script. Any ideas on how to create this variable?
2 Answers 2
Here you have two solutions stay with the external script or use a python lib to manipulate you data
Python script
I suggest you use os.system to call you command instead of the magic !
character. It will be more Pythonic and easier to manipulate with variables.
import os
from string import ascii_uppercase as letter
dd = 20201129 # change this variable
nb_file = 24 # the number of files you want to use
gdal_calc = "/content/drive/MyDrive/NOAA/grib2_files_test/gdal_calc.py"
smoke_path = "/content/drive/MyDrive/NOAA/grib2_files/202011/{}00/postprd/nearsfc_smoke_{}.grib2"
output = "/content/drive/MyDrive/NOAA/grib2_files/averaged2/{}.tiff"
command = ['python', gdal_calc] #start of the command
for i in range(24): # add every line
command.append(f'-{letter[i]}') # add the letter
command.append(smoke_path.format(dd, f'{i:02d}'))
command.append(f'--outfile={output}')
command.append(f'--calc=\"({"+".join(letter[:nb_files])}/{nb_file})\"')
# execute the command
os.system(" ".join(command))
Using rasterio
I highly suggest you have a look to the rasterio lib that will help you perform numerous image manipulations in Python. What you want to do is simply compute the mean of all these files so instead of using gdal_calc I would do the following
import rasterio as rio
import numpy as np
dd = 20201129 # change this variable
nb_file = 24 # the number of files you want to use
smoke_path = "/content/drive/MyDrive/NOAA/grib2_files/202011/{}00/postprd/nearsfc_smoke_{}.grib2"
output = "/content/drive/MyDrive/NOAA/grib2_files/averaged2/{}.tiff"
# create a reading function that open and cloe the file
def read_file(file):
with rio.open(file) as src:
return(src.read(1))
# create the data list
array_list = [read_file(smoke_path.format(dd, f'{i:02d}')) for i in range(nb_file)]
# compute the mean
output_array = np.mean(array_list, axis=0)
# write the final file
with rio.open(smoke_path.format(dd, '00')) as src:
profile = src.profile
profile.update(count=1) # I'm not sure that there is only 1 band in your source file
with rio.open(output, 'w', **profile) as dst:
dst.write(output_array, 1)
thank you @pierrick-rambaud! I modified your first code like this and it worked:
from string import ascii_uppercase as letter
import os
# main
if __name__ == "__main__":
gdal_calc = '/content/drive/MyDrive/NOAA/grib2_files_test/gdal_calc.py'
command = ['python', gdal_calc]
dd = '20201129'
nb_file = 24
smoke_path = "/content/drive/MyDrive/NOAA/grib2_files/202011/{}00/postprd/nearsfc_smoke_{}.grib2"
smoke_path1 = "/content/drive/MyDrive/NOAA/grib2_files/202011/"
smoke_path2 = "00/postprd/nearsfc_smoke_"
smoke_path3 = ".grib2"
for i in range(nb_file):
# print("i=" + str(i) + " letter=" + letter[i]);
command.append('-' + letter[i])
stri = str(i)
if i<10:
stri = '0' + stri
command.append(smoke_path1 + dd + smoke_path2 + stri + smoke_path3)
output = "/content/drive/MyDrive/NOAA/grib2_files/averaged/" + dd + ".tiff"
command.append("--outfile=" + output)
letrList = '+'.join(letter)
command.append('--calc="(' + letrList + ')/' + str(nb_file) + '"')
# command.append('--calc="(A+B+C+D+E+F+G+H+I+J+K+L+M+N+O+P+Q+R+S+T+U+V+W+X)/24"')
theCommand = ' '.join(command)
print('command=[' + theCommand + ']')
os.system(theCommand)
The second option you posted output to file, but I couldn't upload to GEE due to this error: "Error: Projection for 20201129_3.tiff could not be determined. Make sure both CRS and affine transform are present."
-
cool it worked. As I'm using your initial image profile, I think the crs is not set (and gdal default to 4326). If the first solution work, keep it simple and ask about GEE upload problems in another questionPierrick Rambaud– Pierrick Rambaud2021年05月25日 07:49:59 +00:00Commented May 25, 2021 at 7:49