Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
This repository was archived by the owner on Aug 14, 2024. It is now read-only.

Slurm Snakefile & profile #25

tkphd started this conversation in Ideas
Aug 9, 2022 · 11 comments · 5 replies
Discussion options

@reida and I have cobbled together a working Snakefile and cluster config for this lesson, using smk-simple-slurm for the profile skeleton. Note that some options in config.yaml are carried over, and there are likely extraneous parameters; streamlining both files would be worthwhile, but requires more Snakemake expertise than I currently have.

Profile (cluster/config.yaml)

This profile assumes you have a Conda environment named "amdahl" in which the amdahl package is installed.

# Cluster profile for Snakemake used in the HPC Workflows lessons
#
# This is a YAML file (<https://yaml.org>, a hierarchical plain-text
# data storage format for lists, key-value pairs, and nested instances
# of either or both. Indentation and punctuation matter!
#
# Use a YAML linter to check for syntax errors after editing:
# yamllint config.yaml
---
# --------------------------------------
# Scheduler settings (cluster-dependent)
# --------------------------------------
# A full listing is in the Snakemake distribution's top-level `__init__.py`
# file. You can view the latest version on the official repository:
# <https://github.com/snakemake/snakemake/blob/main/snakemake/__init__.py>
cluster:
 sbatch
 --job-name={rule}-np{resources.tasks}
 --partition={resources.partition}
 --nodes={resources.nodes}
 --ntasks={resources.tasks}
 --time={resources.time}
 --output=slurm_{rule}_np{resources.tasks}.log
default-resources:
 - partition=rack6 # name of partition (or queue) on which jobs will run
 - nodes=1 # number of cluster nodes to reserve
 - tasks=1 # number of cluster cores to reserve (total)
 - time=5 # maximum expected runtime of each job, in minutes
# ------------------------------------------
# Global job settings (platform independent)
# ------------------------------------------
# directory where Conda or Mamba is installed
# default: none
conda-base-path: "~/mambaforge"
# use a Conda environment
# default: false
use-conda: true
# run at most N CPU cluster jobs in parallel
# default: number of cores on host machine
jobs: 500
# max frequency of job status checks
# default: 10
max-status-checks-per-second: 1
# use at most N cores of the host machine in parallel
# (the cores are used to execute local rules)
# default: number of cores on host machine
local-cores: 1
# how many seconds to wait for an output file
# to appear after the execution of a rule
# (cluster filesystem latency hurts!)
# default: 3
latency-wait: 60
# keep going with independent jobs if a job fails?
# default: false
keep-going: false
# print the shell command of each job
# default: false
printshellcmds: true

Snakefile

"Tasks" is a list of the number of CPU cores to run the amdahl program with.

# Snakefile to run Amdahl's Law for HPC Workflows
TASKS = [5, 6, 7, 8]
rule plot:
 input:
 expand("amdahl_np{task}.json", task=TASKS)
 output:
 "plot.log"
 log:
 "smk_plot.log"
 resources:
 tasks=1
 conda:
 "amdahl"
 shell:
 "echo {input} > {output}"
rule amdahl:
 input:
 output:
 "amdahl_np{sample}.json"
 log:
 "smk_np{sample}.log"
 resources:
 tasks=lambda wildcards: int(wildcards.sample)
 shell:
 "mpirun amdahl --terse > {output}"
rule clean:
 input:
 output:
 resources:
 tasks=1
 conda:
 "amdahl"
 shell:
 "rm *.log *.json"

Usage

Assuming your cluster uses Slurm, launch the workflow using

snakemake --profile cluster/

We can work during the CarpentryCon Sprint today and Friday to incorporate this content (or something like it) into the lesson.

Edited with new scripts that use Snakemake's built-in Conda facilities

You must be logged in to vote

Replies: 11 comments 5 replies

Comment options

Interesting that the invocation of snakemake --profile diretory/ changes the semantics of the snakefile, the shell operation doesn't literally run that mpirun line in the shell, it instead puts it as the content of the submit file.

Possibly important to be clear about this.

You must be logged in to vote
0 replies
Comment options

tkphd
Aug 9, 2022
Maintainer Author

Something to consider: does this Snakefile cover the important parts, and enough of the important topics, to say we have "taught" Snakemake to the learners?

Likewise, Snakemake is not the only workflow management tool, and we want our learners to be able to evaluate other tools based on their perceived needs and the tools' capabilities.

This Snakefile represents an end-stage of the lesson, and is expected to start out with more naive stanzas to introduce topics and evolve toward more succinct rules at the end. Our goal for the lesson is to teach workflow management, more than simply "here's how to do a scaling study using Snakemake."

You must be logged in to vote
0 replies
Comment options

tkphd
Aug 9, 2022
Maintainer Author

Implicit setup:

  1. Create a Conda environment named "amdahl" on the head node.
    • The environment must also be available on the cluster nodes!
  2. Activate the "amdahl" environment.
  3. Install Snakemake (conda install -c bioconda snakemake)
  4. Install amdahl (pip install amdahl)
  5. Write the Snakefile
  6. Write the cluster config (cluster/config.yaml)
  7. Launch the job!
You must be logged in to vote
3 replies
Comment options

tkphd Aug 9, 2022
Maintainer Author

(we should use a workflow manager for this workflow)

Comment options

bioconda, not bio-conda. Also getting package conflicts for current versions of snakemake and whatever else is in my Amdahl environment. Will try to rectify that before anything else.

Comment options

First crack at the setup (testing on macOS, will test on Linux later):

conda create -n ENV_NAME -c bioconda mpi4py matplotlib snakemake
conda activate ENV_NAME
pip install amdahl
Comment options

tkphd
Aug 9, 2022
Maintainer Author

Learners will not have seen YAML before! Uh oh!

You must be logged in to vote
1 reply
Comment options

link to a YAML linter prominently (and repeatedly?) in the lesson

Comment options

Point from the Sprint discussion, the config.yaml file is indeed a yaml file, and has syntax constraints that arise from that. It may be important to distinguish between Snakemake file syntax and YAML syntax.

You must be logged in to vote
0 replies
Comment options

tkphd
Aug 9, 2022
Maintainer Author

Pipelines in the shell (&&, etc) may not have been covered in The UNIX Shell.

You must be logged in to vote
0 replies
Comment options

tkphd
Aug 9, 2022
Maintainer Author

Useful when the program runs to have it print which host it's on: develop the Snakefile locally or on the head node, then jump to the cluster; we want the machinery to be explicit about where we are, so learners get a little less-lost about where it's being executed.

You must be logged in to vote
1 reply
Comment options

tkphd Aug 9, 2022
Maintainer Author

Thanks @tobyhodges !

Comment options

tkphd
Aug 9, 2022
Maintainer Author

Instructor & sysadmin onboarding to make sure the dependencies are satisfied

You must be logged in to vote
0 replies
Comment options

tkphd
Aug 9, 2022
Maintainer Author

Self-document the YAML file with more & better comments

You must be logged in to vote
0 replies
Comment options

tkphd
Aug 9, 2022
Maintainer Author

Mapping exercise: which Snakemake concepts are covered in each episode? Which are present in the "final" Snakefile, and which require intermediate versions?

See #15!

You must be logged in to vote
0 replies
Comment options

tkphd
Aug 10, 2022
Maintainer Author

Snakemake supports Conda and modules -- simplified scripts in the OP now include more comments as well!

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

AltStyle によって変換されたページ (->オリジナル) /