Job Limits
Each LC platform is a shared resource. Users are expected to adhere to the following usage policies to ensure that the resources can be effectively and productively used by everyone. You can view the policies on a system itself by running:
news job.lim.rzwhippet
Web Version of RZWhippet
There are two login nodes, 28 pdebug, and 8 phighmem nodes (no batch nodes). Each pdebug node is based on Intel Sapphire Rapids processor with 56 cores per socket, 2 sockets per node, and 256 GB DDR5 memory. Each phighmem node is based on Intel Sapphire Rapids processor with 56 cores per socket, 2 sockets per node, and 128 GB HBM memory.
There are 2 scheduling pools:
- pdebug—3136 cores (28 nodes)
- phighmem—896 cores (8 nodes)
Scheduling
RZWhippet jobs are scheduled using SLURM, Jobs are scheduled per core. Scheduling is not technically enforced so users are expected to monitor their own behavior and keep themselves within the current limits while following the policies:
- Users will not compile on the login nodes during daytime hours
- Users can only use up to one node of phighmem at a time
- A user can have a maximum of 336 processors with a runtime of up to 4 hours in queue during the day with the following exceptions:
- An occasional one hour max job for debugging that takes 337-560 processors as long as it is the user's only job in the queue.
- Daytime is 0800-2000 Mondays-Fridays not including holidays
- No production runs allowed, only development and debugging
- Users won't run computationally intensive work on the login node
We are all family and expect developers to play nice. However if someone's job(s) have taken over the machine:
- Call them or send them an email.
- Email ramblings-help@llnl.gov with a screenshot so we can take care of the situation by killing work that violates policy.
This approach will be revisited later and additional limits will be set if necessary. If someone monopolizes the machine, developers can always shift to other RZ resources.
Documentation
- Linux Clusters Tutorial Part One | Linux Clusters Part Two
- Slurm Tutorial (formerly Slurm and Moab
- TCE Home
Contact
Please call or send email to the LC Hotline if you have questions. LC Hotline | phone: 925-422-4531 | email: lc-hotline@llnl.gov
See Compilers page