Table of contents:
The short answer is just to use mpirun as normal.
When properly configured, Open MPI obtains both the list of hosts and how many
processes to start on each host from Torque / PBS Pro directly.
Hence, it is unnecessary to specify the --hostfile, --host, or
-np options to mpirun. Open MPI will use PBS/Torque-native
mechanisms to launch and kill processes (rsh and/or ssh are not
required).
For example:
1 2 3 4 5 6 7
# Allocate a PBS job with 4 nodes shell$ qsub -I -lnodes=4 # Now run an Open MPI job on all the nodes allocated by PBS/Torque # (starting with Open MPI v1.2; you need to specify -np for the 1.0 # and 1.1 series). shell$ mpirun my_mpi_application
This will run the 4 MPI processes on the nodes that were allocated by PBS/Torque. Or, if submitting a script:
1 2 3 4
shell$ cat my_script.sh #!/bin/sh mpirun my_mpi_application shell$ qsub -l nodes=4 my_script.sh
As of this writing, Open PBS is so ancient that we are not aware of any sites running it. As such, we have never tested Open MPI with Open PBS and therefore do not know if it would work or not.
Open MPI has changed how it obtains hosts from Torque / PBS Pro over time:
It is possible that future versions of Open MPI may switch back to using the TM API in a more scalable fashion, but there isn't currently a huge demand for it (reading the $PBS_NODEFILE works just fine).
Note that the TM API is used to launch processes in all versions of Open MPI; the only thing that has changed over time is how Open MPI obtains hostnames.
Bad Things will happen.
We've had reports from some sites that system administrators modify the $PBS_NODEFILE in each job according to local policies. This will currently cause Open MPI to behave in an unpredictable fashion. As long as no new hosts are added to the hostfile, it usually means that Open MPI will incorrectly map processes to hosts, but in some cases it can cause Open MPI to fail to launch processes altogether.
The best course of action is to not modify the $PBS_NODEFILE.
Prior to v1.3, no.
Open MPI <v1.3 will fail to launch processes properly when a hostfile is
specified on the mpirun command line, or if the mpirun --host
option is used.
As of v1.3, Open MPI can use the --hostfile and --host options in
conjunction with TM jobs.
If you are configuring and installing Open MPI yourself, and you want
to insure that you are building the components of Open MPI required for
Torque/PBS Pro support, include the --with-tm option on the configure
command line. Run ./configure --help for further information about this
configure option.
The ompi_info command can be used to determine whether or not an
installed Open MPI includes Torque/PBS Pro support:
1
shell$ ompi_info | grep ras
If the Open MPI installation includes support for Torque/PBS Pro, you should see a line similar to that below. Note the MCA version information varies depending on which version of Open MPI is installed.
1
MCA ras: tm (MCA v2.1.0, API v2.0.0, Component v3.0.0)