-
Notifications
You must be signed in to change notification settings - Fork 2
-
@reida and I have cobbled together a working Snakefile and cluster config for this lesson, using smk-simple-slurm for the profile skeleton. Note that some options in config.yaml are carried over, and there are likely extraneous parameters; streamlining both files would be worthwhile, but requires more Snakemake expertise than I currently have.
Profile (cluster/config.yaml)
This profile assumes you have a Conda environment named "amdahl" in which the amdahl package is installed.
# Cluster profile for Snakemake used in the HPC Workflows lessons # # This is a YAML file (<https://yaml.org>, a hierarchical plain-text # data storage format for lists, key-value pairs, and nested instances # of either or both. Indentation and punctuation matter! # # Use a YAML linter to check for syntax errors after editing: # yamllint config.yaml --- # -------------------------------------- # Scheduler settings (cluster-dependent) # -------------------------------------- # A full listing is in the Snakemake distribution's top-level `__init__.py` # file. You can view the latest version on the official repository: # <https://github.com/snakemake/snakemake/blob/main/snakemake/__init__.py> cluster: sbatch --job-name={rule}-np{resources.tasks} --partition={resources.partition} --nodes={resources.nodes} --ntasks={resources.tasks} --time={resources.time} --output=slurm_{rule}_np{resources.tasks}.log default-resources: - partition=rack6 # name of partition (or queue) on which jobs will run - nodes=1 # number of cluster nodes to reserve - tasks=1 # number of cluster cores to reserve (total) - time=5 # maximum expected runtime of each job, in minutes # ------------------------------------------ # Global job settings (platform independent) # ------------------------------------------ # directory where Conda or Mamba is installed # default: none conda-base-path: "~/mambaforge" # use a Conda environment # default: false use-conda: true # run at most N CPU cluster jobs in parallel # default: number of cores on host machine jobs: 500 # max frequency of job status checks # default: 10 max-status-checks-per-second: 1 # use at most N cores of the host machine in parallel # (the cores are used to execute local rules) # default: number of cores on host machine local-cores: 1 # how many seconds to wait for an output file # to appear after the execution of a rule # (cluster filesystem latency hurts!) # default: 3 latency-wait: 60 # keep going with independent jobs if a job fails? # default: false keep-going: false # print the shell command of each job # default: false printshellcmds: true
Snakefile
"Tasks" is a list of the number of CPU cores to run the amdahl program with.
# Snakefile to run Amdahl's Law for HPC Workflows TASKS = [5, 6, 7, 8] rule plot: input: expand("amdahl_np{task}.json", task=TASKS) output: "plot.log" log: "smk_plot.log" resources: tasks=1 conda: "amdahl" shell: "echo {input} > {output}" rule amdahl: input: output: "amdahl_np{sample}.json" log: "smk_np{sample}.log" resources: tasks=lambda wildcards: int(wildcards.sample) shell: "mpirun amdahl --terse > {output}" rule clean: input: output: resources: tasks=1 conda: "amdahl" shell: "rm *.log *.json"
Usage
Assuming your cluster uses Slurm, launch the workflow using
snakemake --profile cluster/
We can work during the CarpentryCon Sprint today and Friday to incorporate this content (or something like it) into the lesson.
Edited with new scripts that use Snakemake's built-in Conda facilities
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 11 comments 5 replies
-
Interesting that the invocation of snakemake --profile diretory/ changes the semantics of the snakefile, the shell operation doesn't literally run that mpirun line in the shell, it instead puts it as the content of the submit file.
Possibly important to be clear about this.
Beta Was this translation helpful? Give feedback.
All reactions
-
Something to consider: does this Snakefile cover the important parts, and enough of the important topics, to say we have "taught" Snakemake to the learners?
Likewise, Snakemake is not the only workflow management tool, and we want our learners to be able to evaluate other tools based on their perceived needs and the tools' capabilities.
This Snakefile represents an end-stage of the lesson, and is expected to start out with more naive stanzas to introduce topics and evolve toward more succinct rules at the end. Our goal for the lesson is to teach workflow management, more than simply "here's how to do a scaling study using Snakemake."
Beta Was this translation helpful? Give feedback.
All reactions
-
Implicit setup:
- Create a Conda environment named "amdahl" on the head node.
- The environment must also be available on the cluster nodes!
- Activate the "amdahl" environment.
- Install Snakemake (
conda install -c bioconda snakemake) - Install
amdahl(pip install amdahl) - Write the
Snakefile - Write the cluster config (
cluster/config.yaml) - Launch the job!
Beta Was this translation helpful? Give feedback.
All reactions
-
(we should use a workflow manager for this workflow)
Beta Was this translation helpful? Give feedback.
All reactions
-
👀 1
-
bioconda, not bio-conda. Also getting package conflicts for current versions of snakemake and whatever else is in my Amdahl environment. Will try to rectify that before anything else.
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
First crack at the setup (testing on macOS, will test on Linux later):
conda create -n ENV_NAME -c bioconda mpi4py matplotlib snakemake
conda activate ENV_NAME
pip install amdahl
Beta Was this translation helpful? Give feedback.
All reactions
-
Learners will not have seen YAML before! Uh oh!
Beta Was this translation helpful? Give feedback.
All reactions
-
link to a YAML linter prominently (and repeatedly?) in the lesson
Beta Was this translation helpful? Give feedback.
All reactions
-
Point from the Sprint discussion, the config.yaml file is indeed a yaml file, and has syntax constraints that arise from that. It may be important to distinguish between Snakemake file syntax and YAML syntax.
Beta Was this translation helpful? Give feedback.
All reactions
-
Pipelines in the shell (&&, etc) may not have been covered in The UNIX Shell.
Beta Was this translation helpful? Give feedback.
All reactions
-
Useful when the program runs to have it print which host it's on: develop the Snakefile locally or on the head node, then jump to the cluster; we want the machinery to be explicit about where we are, so learners get a little less-lost about where it's being executed.
Beta Was this translation helpful? Give feedback.
All reactions
-
Thanks @tobyhodges !
Beta Was this translation helpful? Give feedback.
All reactions
-
Instructor & sysadmin onboarding to make sure the dependencies are satisfied
Beta Was this translation helpful? Give feedback.
All reactions
-
Self-document the YAML file with more & better comments
Beta Was this translation helpful? Give feedback.
All reactions
-
Mapping exercise: which Snakemake concepts are covered in each episode? Which are present in the "final" Snakefile, and which require intermediate versions?
See #15!
Beta Was this translation helpful? Give feedback.
All reactions
-
Snakemake supports Conda and modules -- simplified scripts in the OP now include more comments as well!
Beta Was this translation helpful? Give feedback.