gnu parallel exit process with timeout

Question 1

Is it possible to abort process for gnu parallel process if it exceeds an estimated runtime? For example, I have a handler for recon-all processing:

while [ -n "${ids[0]}" ] ; do
printf 'Processing ID: %s\n' "${ids[@]}" >&2
 /usr/bin/time -f "$timefmt" \
 printf '%s\n' "${ids[@]}" | parallel --jobs 0 recon-all -s {.} -all - 
 qcache -parallel -openmp 8
 n=$(( n + 1 ))
 ids=( "${all_ids[@]:n*4:4}" ) # pick out the next eight IDs
done

and some patients in recon-all process inside parallel couldn't be completed for some reasons (could run several days, which abnormal). Could I limit the runtime inside parallel for 9 hours, so the command will run another group in the cycle?

Question 2

Not really sure why you want to run in batches of 8. Why not just use -j8 and run all ids with 8 running constantly?

Question 3

You are looking for --timeout.

You can do --timeout 9h or you can do --timeout 1000%. The last will measure how long the median time is for a job to succeed, and given the median it will compute a timeout that is 1000% of the median run time.

The neat thing about using a percentage is that if the compute program gets faster or slower for the normal case, you will not need to change the timeout.

See it in action:

parallel --timeout 300% 'sleep {}; echo {}' ::: 100 2 3 1 50 2 3 1 2 1 3 2 1 4 2 1 2 3
# Compute program gets 10 times faster
parallel --timeout 300% 'sleep {=$_ /= 10 =}; echo {}' ::: 100 2 3 1 50 2 3 1 2 1 3 2 1 4 2 1 2 3

The median (not average) runtime is measured as the median of the succesfully completed jobs (though minimum 3). So if you have 8 jobs with job 5 being infinite, it will get killed when the runtime hits the percentage of the median timeout:

parallel --timeout 300% 'sleep {}; echo {}' ::: 1 2 1 2 100 2 1 2

This also works if the first job is the one that is stuck:

parallel --timeout 300% 'sleep {}; echo {}' ::: 100 2 1 2 1 2 1 2

The only situation it does not work is if all jobslots are stuck on their first job:

parallel -j4 --timeout 300% 'sleep {}; echo {}' ::: 100 100 100 100 1 2 1 2

Question 4

The problem is that issues can occur on first pack of patients, so 7 of 8 done, but e.g 5th have infinite runtime (because it stuck at "CORRECTING DEFECT 5 for example), so the parallel can't measure average processing time to --timeout with %

Question 5

In most cases it can. See edit.

Question 6

gnu parallel doesn't work with 9h right? you should write it in seconds? 6000?

Question 7

Of course it works with 9h.

Question 8

I tried with --timeout 11h flag. And it except an error as unknown argument

Ole Tange Ole Tange 37.4k34 gold badges119 silver badges224 bronze badges · Accepted Answer · 2019-03-26 13:55:58Z

You are looking for --timeout.

You can do --timeout 9h or you can do --timeout 1000%. The last will measure how long the median time is for a job to succeed, and given the median it will compute a timeout that is 1000% of the median run time.

The neat thing about using a percentage is that if the compute program gets faster or slower for the normal case, you will not need to change the timeout.

See it in action:

parallel --timeout 300% 'sleep {}; echo {}' ::: 100 2 3 1 50 2 3 1 2 1 3 2 1 4 2 1 2 3
# Compute program gets 10 times faster
parallel --timeout 300% 'sleep {=$_ /= 10 =}; echo {}' ::: 100 2 3 1 50 2 3 1 2 1 3 2 1 4 2 1 2 3

The median (not average) runtime is measured as the median of the succesfully completed jobs (though minimum 3). So if you have 8 jobs with job 5 being infinite, it will get killed when the runtime hits the percentage of the median timeout:

parallel --timeout 300% 'sleep {}; echo {}' ::: 1 2 1 2 100 2 1 2

This also works if the first job is the one that is stuck:

parallel --timeout 300% 'sleep {}; echo {}' ::: 100 2 1 2 1 2 1 2

The only situation it does not work is if all jobslots are stuck on their first job:

parallel -j4 --timeout 300% 'sleep {}; echo {}' ::: 100 100 100 100 1 2 1 2

The problem is that issues can occur on first pack of patients, so 7 of 8 done, but e.g 5th have infinite runtime (because it stuck at "CORRECTING DEFECT 5 for example), so the parallel can't measure average processing time to --timeout with %
gnu parallel doesn't work with 9h right? you should write it in seconds? 6000?
I tried with --timeout 11h flag. And it except an error as unknown argument

Stack Exchange Network

gnu parallel exit process with timeout

1 Answer 1

You must log in to answer this question.

Hot Network Questions

gnu parallel exit process with timeout

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions