I have a shell script on Linux to extract NetCDF data for each grid into a .txt file.
This is the annual averaged NetCDF data that I used in this conversion (originally was daily data), the size was 85.1 MB https://drive.google.com/file/d/1Nud5b3KvjtE8H4Ol73zdnNSb9eeItaY7/view?usp=sharing
The script runs correctly without any errors. All of the mentioned script has been checked using https://www.shellcheck.net/ and there is no error.
This is my script named extract_global.sh
#!/bin/bash
# Define input file, log file, and output directory
input_file="canesm5_r1i1p1f1_w5e5_ssp126_tas_global_daily_2015_2100_05gc.nc"
log_file="log.txt"
output_dir="/path/output/" # Replace with the desired output directory
export PATH=/home/appl/cdo-1.9.8/bin/cdo:$PATH
# Remove existing log file if it exists
rm -f $log_file
# Create the output directory if it doesn't exist
#mkdir -p $output_dir
# Function to extract data for a specific latitude
extract_latitude() {
lat=1ドル
lon_index=1
max_jobs=20
job_count=0
pids=()
# Loop through longitudes from -180 to 180 in 0.5 degree increments
for lon in $(seq -180 0.5 179.5); do
# Define the output file name based on the current latitude and longitude index
output_file="${output_dir}${lat}_${lon_index}.txt"
# Extract data for the grid point at the current latitude and longitude, save only values, and remove the header
(/home/appl/cdo-1.9.8/bin/cdo -outputtab,value -remapnn,lon="${lon}"_lat="${lat}" $input_file | sed '1d' > "$output_file") &>> $log_file &
# Store the PID of the background job
pids+=($!)
# Increment the longitude index and job count
lon_index=$((lon_index + 1))
job_count=$((job_count + 1))
# Check if the max number of jobs has been reached
if [ $job_count -ge $max_jobs ]; then
# Wait for all the jobs to finish
for pid in "${pids[@]}"; do
wait "$pid"
done
pids=() # Reset the PID array
job_count=0 # Reset job count
fi
done
# Wait for any remaining background processes to finish
for pid in "${pids[@]}"; do
wait "$pid"
done
}
# Loop through latitudes from 90N to -90S in 0.5 degree increments
for lat in $(seq 90 -0.5 -90); do
echo "Extracting data for latitude $lat"
extract_latitude "$lat"
done
echo "Data extraction completed. Check $log_file for details."
If I run ./extract_global.sh directly, the program runs correctly: extract every 20 grid points, wait until the current job is finished then continue the next loop, and produce the correct data extraction inside each produced .txt file.
By default, all jobs on our Linux server run on the main CPU. I want to change the CPU used for this job by changing to a different queue. The main CPU is used by several members, so running heavy jobs can slow down the server. Therefore, in the code above, I set the maximum number of jobs to 20 and used a loop. However, this makes the process very long and inefficient.
I tried to solve this matter, by creating a go.bat file to set up the queue and other configurations. The contents of the go.bat file are as follows:
#!/bin/sh
#$ -S /bin/sh
#$ -q ib.q@node02 # Specify the queue
#$ -N SSP_kzm # Define job name
#$ -j y # Standard output and error message will be written in the same file
#$ -e /path/log.txt # Specify the error file
#$ -M [email protected]
#$ -m be # Send email at the beginning and end of the job
# Load necessary modules
# module load cdo
# Or if cdo is in a custom directory, add it to the PATH
export PATH=/home/appl/cdo-1.9.8/bin/cdo:$PATH
cd /path/
./extract_global.sh >& log.txt
Finally, I ran the job with the settings in the run.sh file, as shown below:
#!/bin/bash
# Submit the conversion program and wait for it to complete
qsub go.bat
echo "Waiting for conversion computation"
date
# Wait for the job to finish
while qstat | grep -q kzm; do
sleep 5
done
date
echo "Job completed"
I ran by using
nohup ./run.sh >& log.txt &
Then, I checked by qstat -f and I got this information:
queuename qtype resv/used/tot. np_load arch states
----------------------------------------------------------------------
ib.q@node02 BIP 0/1/20 0.05 lx-amd64
19847 0.50500 SSP_kzm kzm r 10/10/2024 09:28:04 1
----------------------------------------------------------------------
Results:
- The program ignores the "wait" command inside 'extract_global.sh' script, and produces the output .txt file very quickly.
- The program successfully generated all of the .txt files according to the grid area of the NetCDF file, but the .txt files were empty. In addition, the script produced numerous files named "cores.number" are around 4MB each file with unreadable characters if we open it directly using the text editor.
- No errors detected in log.txt, but there is no value stored in the .txt output.
- I'm not sure if the method I'm using is correct. I've looked for information and tried similar solutions, but I haven't been able to fix it yet.
The main point is that I want to run the ./extract_global.sh script on a queue named ib.q@node02 and I want to allocate the whole available CPU on that node (20 CPUs) to fasten the conversion process.
Should I use MPI parallel computation to be able to use the whole CPU on that node?
If someone is able to help me, I would be very grateful.
Thank you.
bashtag as mentioned above (this advice applies to all languages, not just bash). You should try to boil your problem down to code that readers can copy/paste into their environment and see the same problem. Not voting to close as you are an early poster, and you did post a very nice clean entry. Good luck!extract_...shscript likeexport PATH="/full/path/to:$PATH"where the/full/path/todirectory contains thecdocmd. Not elegant, but this should get you thinking about the differences between your terminal environment and the environment where the command is failing. I mentioned "How to turn a bad script ..." because your problem, (as described now) is on one line of code. A much smaller script could have still demonstrrated this problem. Well. going to bed. Good luck!