Slurm Snakefile & profile · hpc-carpentry/old-hpc-workflows · Discussion #25

This repository was archived by the owner on Aug 14, 2024. It is now read-only.

tkphd
Aug 9, 2022
Maintainer

@reida and I have cobbled together a working Snakefile and cluster config for this lesson, using smk-simple-slurm for the profile skeleton. Note that some options in config.yaml are carried over, and there are likely extraneous parameters; streamlining both files would be worthwhile, but requires more Snakemake expertise than I currently have.

Profile (`cluster/config.yaml`)

This profile assumes you have a Conda environment named "amdahl" in which the amdahl package is installed.

# Cluster profile for Snakemake used in the HPC Workflows lessons
#
# This is a YAML file (<https://yaml.org>, a hierarchical plain-text
# data storage format for lists, key-value pairs, and nested instances
# of either or both. Indentation and punctuation matter!
#
# Use a YAML linter to check for syntax errors after editing:
# yamllint config.yaml
---
# --------------------------------------
# Scheduler settings (cluster-dependent)
# --------------------------------------
# A full listing is in the Snakemake distribution's top-level `__init__.py`
# file. You can view the latest version on the official repository:
# <https://github.com/snakemake/snakemake/blob/main/snakemake/__init__.py>
cluster:
 sbatch
 --job-name={rule}-np{resources.tasks}
 --partition={resources.partition}
 --nodes={resources.nodes}
 --ntasks={resources.tasks}
 --time={resources.time}
 --output=slurm_{rule}_np{resources.tasks}.log
default-resources:
 - partition=rack6 # name of partition (or queue) on which jobs will run
 - nodes=1 # number of cluster nodes to reserve
 - tasks=1 # number of cluster cores to reserve (total)
 - time=5 # maximum expected runtime of each job, in minutes
# ------------------------------------------
# Global job settings (platform independent)
# ------------------------------------------
# directory where Conda or Mamba is installed
# default: none
conda-base-path: "~/mambaforge"
# use a Conda environment
# default: false
use-conda: true
# run at most N CPU cluster jobs in parallel
# default: number of cores on host machine
jobs: 500
# max frequency of job status checks
# default: 10
max-status-checks-per-second: 1
# use at most N cores of the host machine in parallel
# (the cores are used to execute local rules)
# default: number of cores on host machine
local-cores: 1
# how many seconds to wait for an output file
# to appear after the execution of a rule
# (cluster filesystem latency hurts!)
# default: 3
latency-wait: 60
# keep going with independent jobs if a job fails?
# default: false
keep-going: false
# print the shell command of each job
# default: false
printshellcmds: true

Snakefile

"Tasks" is a list of the number of CPU cores to run the amdahl program with.

# Snakefile to run Amdahl's Law for HPC Workflows
TASKS = [5, 6, 7, 8]
rule plot:
 input:
 expand("amdahl_np{task}.json", task=TASKS)
 output:
 "plot.log"
 log:
 "smk_plot.log"
 resources:
 tasks=1
 conda:
 "amdahl"
 shell:
 "echo {input} > {output}"
rule amdahl:
 input:
 output:
 "amdahl_np{sample}.json"
 log:
 "smk_np{sample}.log"
 resources:
 tasks=lambda wildcards: int(wildcards.sample)
 shell:
 "mpirun amdahl --terse > {output}"
rule clean:
 input:
 output:
 resources:
 tasks=1
 conda:
 "amdahl"
 shell:
 "rm *.log *.json"

Usage

Assuming your cluster uses Slurm, launch the workflow using

snakemake --profile cluster/

We can work during the CarpentryCon Sprint today and Friday to incorporate this content (or something like it) into the lesson.

Edited with new scripts that use Snakemake's built-in Conda facilities

Replies: 11 comments 5 replies

reid-a
Aug 9, 2022
Maintainer

Interesting that the invocation of snakemake --profile diretory/ changes the semantics of the snakefile, the shell operation doesn't literally run that mpirun line in the shell, it instead puts it as the content of the submit file.

Possibly important to be clear about this.

0 replies

tkphd
Aug 9, 2022
Maintainer Author

Something to consider: does this Snakefile cover the important parts, and enough of the important topics, to say we have "taught" Snakemake to the learners?

Likewise, Snakemake is not the only workflow management tool, and we want our learners to be able to evaluate other tools based on their perceived needs and the tools' capabilities.

This Snakefile represents an end-stage of the lesson, and is expected to start out with more naive stanzas to introduce topics and evolve toward more succinct rules at the end. Our goal for the lesson is to teach workflow management, more than simply "here's how to do a scaling study using Snakemake."

0 replies

tkphd
Aug 9, 2022
Maintainer Author

Implicit setup:

Create a Conda environment named "amdahl" on the head node.
- The environment must also be available on the cluster nodes!
Activate the "amdahl" environment.
Install Snakemake (conda install -c bioconda snakemake)
Install amdahl (pip install amdahl)
Write the Snakefile
Write the cluster config (cluster/config.yaml)
Launch the job!

3 replies

@tkphd

tkphd Aug 9, 2022
Maintainer Author

(we should use a workflow manager for this workflow)

@mikerenfro

mikerenfro Aug 9, 2022
Maintainer

bioconda, not bio-conda. Also getting package conflicts for current versions of snakemake and whatever else is in my Amdahl environment. Will try to rectify that before anything else.

@mikerenfro

mikerenfro Aug 9, 2022
Maintainer

First crack at the setup (testing on macOS, will test on Linux later):

conda create -n ENV_NAME -c bioconda mpi4py matplotlib snakemake
conda activate ENV_NAME
pip install amdahl

tkphd
Aug 9, 2022
Maintainer Author

Learners will not have seen YAML before! Uh oh!

1 reply

@tobyhodges

tobyhodges Aug 9, 2022
Maintainer

link to a YAML linter prominently (and repeatedly?) in the lesson

reid-a
Aug 9, 2022
Maintainer

Point from the Sprint discussion, the config.yaml file is indeed a yaml file, and has syntax constraints that arise from that. It may be important to distinguish between Snakemake file syntax and YAML syntax.

0 replies

tkphd
Aug 9, 2022
Maintainer Author

Pipelines in the shell (&&, etc) may not have been covered in The UNIX Shell.

0 replies

tkphd
Aug 9, 2022
Maintainer Author

Useful when the program runs to have it print which host it's on: develop the Snakefile locally or on the head node, then jump to the cluster; we want the machinery to be explicit about where we are, so learners get a little less-lost about where it's being executed.

1 reply

@tkphd

tkphd Aug 9, 2022
Maintainer Author

Thanks @tobyhodges !

tkphd
Aug 9, 2022
Maintainer Author

Instructor & sysadmin onboarding to make sure the dependencies are satisfied

0 replies

tkphd
Aug 9, 2022
Maintainer Author

Self-document the YAML file with more & better comments

0 replies

tkphd
Aug 9, 2022
Maintainer Author

Mapping exercise: which Snakemake concepts are covered in each episode? Which are present in the "final" Snakefile, and which require intermediate versions?

See #15!

0 replies

tkphd
Aug 10, 2022
Maintainer Author

Snakemake supports Conda and modules -- simplified scripts in the OP now include more comments as well!

0 replies

Slurm Snakefile & profile #25

Uh oh!

Uh oh!

tkphd Aug 9, 2022 Maintainer

Profile (cluster/config.yaml)

Snakefile

Usage

Replies: 11 comments · 5 replies

Uh oh!

reid-a Aug 9, 2022 Maintainer

Uh oh!

tkphd Aug 9, 2022 Maintainer Author

Uh oh!

Uh oh!

tkphd Aug 9, 2022 Maintainer Author

Uh oh!

tkphd Aug 9, 2022 Maintainer Author

Uh oh!

mikerenfro Aug 9, 2022 Maintainer

Uh oh!

mikerenfro Aug 9, 2022 Maintainer

Uh oh!

tkphd Aug 9, 2022 Maintainer Author

Uh oh!

tobyhodges Aug 9, 2022 Maintainer

Uh oh!

reid-a Aug 9, 2022 Maintainer

Uh oh!

tkphd Aug 9, 2022 Maintainer Author

Uh oh!

tkphd Aug 9, 2022 Maintainer Author

Uh oh!

tkphd Aug 9, 2022 Maintainer Author

Uh oh!

tkphd Aug 9, 2022 Maintainer Author

Uh oh!

tkphd Aug 9, 2022 Maintainer Author

Uh oh!

Uh oh!

tkphd Aug 9, 2022 Maintainer Author

Uh oh!

tkphd Aug 10, 2022 Maintainer Author

tkphd
Aug 9, 2022
Maintainer

Profile (`cluster/config.yaml`)

Replies: 11 comments 5 replies

reid-a
Aug 9, 2022
Maintainer

tkphd
Aug 9, 2022
Maintainer Author

tkphd
Aug 9, 2022
Maintainer Author

tkphd Aug 9, 2022
Maintainer Author

mikerenfro Aug 9, 2022
Maintainer

mikerenfro Aug 9, 2022
Maintainer

tkphd
Aug 9, 2022
Maintainer Author

tobyhodges Aug 9, 2022
Maintainer

reid-a
Aug 9, 2022
Maintainer

tkphd
Aug 9, 2022
Maintainer Author

tkphd
Aug 9, 2022
Maintainer Author

tkphd Aug 9, 2022
Maintainer Author

tkphd
Aug 9, 2022
Maintainer Author

tkphd
Aug 9, 2022
Maintainer Author

tkphd
Aug 9, 2022
Maintainer Author

tkphd
Aug 10, 2022
Maintainer Author