Is it possible to abort process for gnu parallel process if it exceeds an estimated runtime? For example, I have a handler for recon-all processing:
while [ -n "${ids[0]}" ] ; do
printf 'Processing ID: %s\n' "${ids[@]}" >&2
/usr/bin/time -f "$timefmt" \
printf '%s\n' "${ids[@]}" | parallel --jobs 0 recon-all -s {.} -all -
qcache -parallel -openmp 8
n=$(( n + 1 ))
ids=( "${all_ids[@]:n*4:4}" ) # pick out the next eight IDs
done
and some patients in recon-all process inside parallel couldn't be completed for some reasons (could run several days, which abnormal). Could I limit the runtime inside parallel for 9 hours, so the command will run another group in the cycle?
-
Not really sure why you want to run in batches of 8. Why not just use -j8 and run all ids with 8 running constantly?Ole Tange– Ole Tange2019年03月27日 14:41:07 +00:00Commented Mar 27, 2019 at 14:41
1 Answer 1
You are looking for --timeout
.
You can do --timeout 9h
or you can do --timeout 1000%
. The last will measure how long the median time is for a job to succeed, and given the median it will compute a timeout that is 1000% of the median run time.
The neat thing about using a percentage is that if the compute program gets faster or slower for the normal case, you will not need to change the timeout.
See it in action:
parallel --timeout 300% 'sleep {}; echo {}' ::: 100 2 3 1 50 2 3 1 2 1 3 2 1 4 2 1 2 3
# Compute program gets 10 times faster
parallel --timeout 300% 'sleep {=$_ /= 10 =}; echo {}' ::: 100 2 3 1 50 2 3 1 2 1 3 2 1 4 2 1 2 3
The median (not average) runtime is measured as the median of the succesfully completed jobs (though minimum 3). So if you have 8 jobs with job 5 being infinite, it will get killed when the runtime hits the percentage of the median timeout:
parallel --timeout 300% 'sleep {}; echo {}' ::: 1 2 1 2 100 2 1 2
This also works if the first job is the one that is stuck:
parallel --timeout 300% 'sleep {}; echo {}' ::: 100 2 1 2 1 2 1 2
The only situation it does not work is if all jobslots are stuck on their first job:
parallel -j4 --timeout 300% 'sleep {}; echo {}' ::: 100 100 100 100 1 2 1 2
-
The problem is that issues can occur on first pack of patients, so 7 of 8 done, but e.g 5th have infinite runtime (because it stuck at "CORRECTING DEFECT 5 for example), so the parallel can't measure average processing time to --timeout with %Relyativist– Relyativist2019年03月27日 14:05:03 +00:00Commented Mar 27, 2019 at 14:05
-
In most cases it can. See edit.Ole Tange– Ole Tange2019年03月27日 14:36:15 +00:00Commented Mar 27, 2019 at 14:36
-
gnu parallel doesn't work with 9h right? you should write it in seconds? 6000?Relyativist– Relyativist2019年03月27日 16:11:56 +00:00Commented Mar 27, 2019 at 16:11
-
Of course it works with 9h.Ole Tange– Ole Tange2019年03月27日 20:42:35 +00:00Commented Mar 27, 2019 at 20:42
-
I tried with --timeout 11h flag. And it except an error as unknown argumentRelyativist– Relyativist2019年03月28日 07:53:18 +00:00Commented Mar 28, 2019 at 7:53