RZHound | HPC @ LLNL

Job Limits

Each LC platform is a shared resource. Users are expected to adhere to the following usage policies to ensure that the resources can be effectively and productively used by everyone. You can view the policies on a system itself by running:

news job.lim.MACHINENAME

Hardware

Each RZHound node is based on Intel Sapphire Rapids processor with 56 cores per socket, 2 sockets per node, and 256 GB DDR5 memory.

Scheduling

Batch jobs are scheduled through SLURM.

pdebug—16 nodes (1702 cores), interactive use only.
pbatch— 358 nodes (40096 cores), batch use only.

Pools Max nodes/job Max runtime
---------------------------------------------------
pdebug 4(*) 1 hour
pbatch 32(**) 24 hours
---------------------------------------------------

(*) Please limit the use of pdebug to 8 nodes on a PER USER basis, not a PER JOB basis, to allow other users access. Pdebug is scheduled using fairshare and jobs are core-scheduled, not node-scheduled. To allocate whole nodes, add a '--exclusive' flag to your sbatch or salloc command.

(**) In addition to the max-nodes / job limit, there is an additional limit of 32 nodes per user per bank across all of an individual user's jobs.

Do NOT run computationally intensive work on the login nodes. There are a limited number of login nodes which are meant primarily for editing files and launching jobs. A majority of the time when a login node is laggy, it is because a user has started up a compile on that login node.

Pdebug is intended for debugging, visualization, and other inherently interactive work. It is not intended for production work. Do not use pdebug to run batch jobs. Do not chain jobs to run one after the other. Individuals who misuse the pdebug queue in this or any similar manner may be denied access to running jobs in the pdebug queue.

Pdebug is core scheduled. To allocate whole nodes, add a '--exclusive' flag to your sbatch or salloc command.

Interactive access to a batch node is allowed while you have a batch job running on that node, and only for the purpose of monitoring your job. When logging into a batch node, be mindful of the impact your work has on the other jobs running on the node.

Scratch Disk Space: Consult CZ File Systems Web Page: https://lc.llnl.gov/fsstatus/fsstatus.cgi

Documentation

Linux Clusters Tutorial Part One | Linux Clusters Part Two
Slurm Tutorial (formerly Slurm and Moab)

Contact

Please call or send email to the LC Hotline if you have questions. LC Hotline | phone: 925-422-4531 | email: lc-hotline@llnl.gov

Zone

Vendor

Dell

User-Available Nodes

Batch Nodes

374

Total Nodes

386

CPUs

CPU Architecture

Intel Sapphire Rapids

Cores/Node

112

Total Cores

41,888

Memory Total (GiB)

95,744

CPU Memory/Node (GiB)

256

Peak Performance

Peak PFLOPS (CPUs)

2.655

Peak PFLOPS (CPUs+GPUs)

2.655

Clock Speed (GHz)

2.0

TOSS 4

Interconnect

Cornelis Networks

Parallel job type

multiple nodes per job

Scheduler

Slurm

Recommended location for parallel file space

/p/lustre{...}

Program

ASC, M&IC

Class

CTS-2

Year Commissioned

2023

Compilers

See Compilers page

Documentation

Introduction to LC Resources

Linux Clusters Overview

Slurm Tutorial