1

I am relatively new to coding and am working on a script to average 24 hourly GRIB2 files to daily files and output to a TIF. I am currently doing this in Google Colab by calling a GDAL python script called gdal_calc.py (downloadable from GDAL), which I have uploaded to my drive.

My code looks like this (for November 29):

import os
! python /content/drive/MyDrive/NOAA/grib2_files_test/gdal_calc.py \
 -A /content/drive/MyDrive/NOAA/grib2_files/202011/2020112900/postprd/nearsfc_smoke_00.grib2 \
 -B /content/drive/MyDrive/NOAA/grib2_files/202011/2020112900/postprd/nearsfc_smoke_01.grib2 \
 -C /content/drive/MyDrive/NOAA/grib2_files/202011/2020112900/postprd/nearsfc_smoke_02.grib2 \
etc (D-V)
 -W /content/drive/MyDrive/NOAA/grib2_files/202011/2020112900/postprd/nearsfc_smoke_22.grib2 \
 -X /content/drive/MyDrive/NOAA/grib2_files/202011/2020112900/postprd/nearsfc_smoke_23.grib2 \
 --outfile=/content/drive/MyDrive/NOAA/grib2_files/averaged2/20201129.tiff \
 --calc="(A+B+C+D+E+F+G+H+I+J+K+L+M+N+O+P+Q+R+S+T+U+V+W+X)/24" 

I have data for all of 2020, but there are missing days so I haven't tried to make a monthly loop. I would however love to create a "dd" variable that I could change for each day instead of manually changing the day value in each of the file names and output name. I think my attempts to do this have not worked because I am calling an external script. Any ideas on how to create this variable?

gene
55.8k3 gold badges115 silver badges196 bronze badges
asked May 10, 2021 at 1:43

2 Answers 2

1

Here you have two solutions stay with the external script or use a python lib to manipulate you data

Python script

I suggest you use os.system to call you command instead of the magic !character. It will be more Pythonic and easier to manipulate with variables.

import os 
from string import ascii_uppercase as letter
dd = 20201129 # change this variable 
nb_file = 24 # the number of files you want to use 
gdal_calc = "/content/drive/MyDrive/NOAA/grib2_files_test/gdal_calc.py"
smoke_path = "/content/drive/MyDrive/NOAA/grib2_files/202011/{}00/postprd/nearsfc_smoke_{}.grib2"
output = "/content/drive/MyDrive/NOAA/grib2_files/averaged2/{}.tiff"
command = ['python', gdal_calc] #start of the command 
for i in range(24): # add every line 
 command.append(f'-{letter[i]}') # add the letter 
 command.append(smoke_path.format(dd, f'{i:02d}'))
command.append(f'--outfile={output}')
command.append(f'--calc=\"({"+".join(letter[:nb_files])}/{nb_file})\"')
# execute the command 
os.system(" ".join(command))

Using rasterio

I highly suggest you have a look to the rasterio lib that will help you perform numerous image manipulations in Python. What you want to do is simply compute the mean of all these files so instead of using gdal_calc I would do the following


import rasterio as rio 
import numpy as np 
dd = 20201129 # change this variable 
nb_file = 24 # the number of files you want to use 
smoke_path = "/content/drive/MyDrive/NOAA/grib2_files/202011/{}00/postprd/nearsfc_smoke_{}.grib2"
output = "/content/drive/MyDrive/NOAA/grib2_files/averaged2/{}.tiff"
# create a reading function that open and cloe the file 
def read_file(file):
 with rio.open(file) as src:
 return(src.read(1))
# create the data list 
array_list = [read_file(smoke_path.format(dd, f'{i:02d}')) for i in range(nb_file)]
# compute the mean 
output_array = np.mean(array_list, axis=0)
# write the final file 
with rio.open(smoke_path.format(dd, '00')) as src:
 profile = src.profile
 profile.update(count=1) # I'm not sure that there is only 1 band in your source file
 with rio.open(output, 'w', **profile) as dst:
 dst.write(output_array, 1)
answered May 13, 2021 at 6:25
0
0

thank you @pierrick-rambaud! I modified your first code like this and it worked:

from string import ascii_uppercase as letter
import os
# main
if __name__ == "__main__":
 gdal_calc = '/content/drive/MyDrive/NOAA/grib2_files_test/gdal_calc.py'
 command = ['python', gdal_calc]
 dd = '20201129'
 nb_file = 24
 smoke_path = "/content/drive/MyDrive/NOAA/grib2_files/202011/{}00/postprd/nearsfc_smoke_{}.grib2"
 smoke_path1 = "/content/drive/MyDrive/NOAA/grib2_files/202011/"
 smoke_path2 = "00/postprd/nearsfc_smoke_"
 smoke_path3 = ".grib2"
 for i in range(nb_file):
 # print("i=" + str(i) + " letter=" + letter[i]);
 command.append('-' + letter[i])
 stri = str(i)
 if i<10:
 stri = '0' + stri
 command.append(smoke_path1 + dd + smoke_path2 + stri + smoke_path3)
 output = "/content/drive/MyDrive/NOAA/grib2_files/averaged/" + dd + ".tiff"
 command.append("--outfile=" + output)
 letrList = '+'.join(letter)
 command.append('--calc="(' + letrList + ')/' + str(nb_file) + '"')
 # command.append('--calc="(A+B+C+D+E+F+G+H+I+J+K+L+M+N+O+P+Q+R+S+T+U+V+W+X)/24"')
 theCommand = ' '.join(command)
 print('command=[' + theCommand + ']')
 os.system(theCommand)

The second option you posted output to file, but I couldn't upload to GEE due to this error: "Error: Projection for 20201129_3.tiff could not be determined. Make sure both CRS and affine transform are present."

answered May 23, 2021 at 17:59
1
  • cool it worked. As I'm using your initial image profile, I think the crs is not set (and gdal default to 4326). If the first solution work, keep it simple and ask about GEE upload problems in another question Commented May 25, 2021 at 7:49

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.