12
\$\begingroup\$

I am using a Bash script to execute a Python script multiple times. In order to speed up the execution, I would like to execute these (independent) processes in parallel. The code below does so:

#!/usr/bin/env bash
script="path_to_python_script"
N=16 # number of processors
mkdir -p data/
for i in `seq 1 1 100`; do
 for j in {1..100}; do
 ((q=q%N)); ((q++==0)) && wait
 if [ -e data/file_$i-$j.txt ]
 then
 echo "data/file_$i-$j.txt exists"
 else
 ($script -args_1 $i > data/file_$i-$j.txt ;
 $script -args_1 $i -args_2 value -args_3 value >> data/file_$i-$j.txt) &
 fi
 done
done

However, I am wondering if this code follow common best practices of parallelization of for loops in Bash? Are there ways to improve the efficiency of this code?

200_success
145k22 gold badges190 silver badges478 bronze badges
asked Jun 12, 2019 at 10:06
\$\endgroup\$
1
  • 6
    \$\begingroup\$ You can use GNU Parallel, it’s very helpful the execute a number of tasks in a controlled way. \$\endgroup\$ Commented Jun 12, 2019 at 20:55

2 Answers 2

12
\$\begingroup\$

Some suggestions:

  • The trailing slash in the mkdir command is redundant.
  • $(...) is preferred over backticks for command substitution.
  • Why use seq in one command? They both do the same loop, so you might as well use {1..100} in both places.
  • Semicolons are unnecessary in the vast majority of cases. Simply use a newline to achieve the same separation between commands.
  • Use More QuotesTM
  • set -o errexit -o noclobber -o nounset at the start of the script will be helpful. It'll exit the script instead of overwriting any files, for example, so you can get rid of the inner if statement if it's OK that the script stops when the file exists.
  • [[ is preferred over [.
  • The whole exercise is probably easier to achieve with some standard pattern like GNU parallel. Currently the script starts N commands, then waits for all of them to finish before starting any more. Unless the processes take very similar time this is going to waste a lot of time waiting.
  • N (or for example processors for readability) should be determined dynamically, using for example nproc --all, rather than hardcoded.
  • If you're worried about speed you should probably not create a subshell for your two script commands. { and } will group commands without creating a subshell.
  • For the same reason you probably want to do a single redirection like { "$script" ... && "$script" ...; } > "data/file_${i}-${j}.txt"
  • Since you're "only" counting to 10,000 you don't need to reset q every time. You can for example set process_count=0 outside the outer loop and check the modulo in a readable way such as:

    if [[ "$process_count" % "$processors" -eq 0 ]]
    then
     wait
    fi
    
  • The inner code (from the line starting with ((q=q%N))) should be indented one more time.
answered Jun 12, 2019 at 10:17
\$\endgroup\$
2
  • 4
    \$\begingroup\$ [ is more portable and more consistent than [[. So your preference is certainly arguable. \$\endgroup\$ Commented Jun 12, 2019 at 11:41
  • 3
    \$\begingroup\$ @TobySpeight 1) Writing truly portable scripts is an absolute nightmare and not a good idea in terms of maintainability. 2) OP asked about Bash specifically. 3) The accepted answer (above the one you linked to) has about eight times more votes, so I would say the community has spoken. \$\endgroup\$ Commented Jun 12, 2019 at 20:58
5
\$\begingroup\$

Using GNU Parallel you code will look something like this:

#!/usr/bin/env bash
export script="path_to_python_script"
doit() {
 i="1ドル"
 j="2ドル"
 $script -args_1 "$i"
 $script -args_1 "$i" -args_2 value -args_3 value
}
export -f doit
parallel --resume --results data/file_{1}-{2}.txt doit ::: {1..100} ::: {1..100}

In your original code if one job in a batch of 16 takes longer than the other 15, then you will have 15 cores sitting idle waiting for the last to finish.

Compared to your original code this will use the CPUs better because a new job is started as soon as a job finishes.

answered Jun 15, 2019 at 22:41
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.