circRNA detection pipeline built on oxo-flow.
# 1. Setup ./scripts/setup.sh # 2. Edit config.toml with your reference paths # Just set reference_dir="/path/to/your/GRCh38" and you're done! # 3. Run oxo-flow run circrna.oxoflow -j 16
- 4 detection methods: CIRIquant, CIRCexplorer2, find_circ, circRNA_finder
- Ensemble aggregation: High-confidence calls detected by ≥2 methods
- Single config file: No need to edit multiple configuration files
- Auto environment setup: One script creates all conda environments
- Comprehensive reports: HTML reports with statistics and visualizations
Create a directory with this structure:
/data/references/GRCh38/
├── genome.fa # Reference FASTA
├── genome.fa.fai # FASTA index
├── genes.gtf # Gene annotation (GENCODE)
├── hg38_ref.txt # CIRCexplorer2 reference (run: fetch_ucsc.py hg38 > hg38_ref.txt)
├── bwa/ # BWA index
│ └── genome.fa.bwt
├── bowtie2/ # Bowtie2 index
│ └── genome.fa.1.bt2
├── star/ # STAR index
│ └── Genome
└── hisat2/ # HISAT2 index (for CIRIquant)
└── genome.fa.ht2
Set reference_dir = "/data/references/GRCh38" in config.toml.
# Create index directories mkdir -p reference/{bwa,bowtie2,star,hisat2} # BWA (for CIRIquant and CIRCexplorer2) bwa index -p reference/bwa/genome.fa reference/genome.fa # Bowtie2 (for find_circ) bowtie2-build reference/genome.fa reference/bowtie2/genome.fa # STAR (for circRNA_finder) STAR --runMode genomeGenerate --genomeDir reference/star \ --genomeFastaFiles reference/genome.fa --runThreadN 8 # HISAT2 (for CIRIquant) hisat2-build reference/genome.fa reference/hisat2/genome.fa
[config] reference_dir = "/data/references/GRCh38" samples = "samples.csv" [defaults] threads = 8 memory = "16G"
sample,r1_fastq,r2_fastq SAMPLE_01,raw/SAMPLE_01_1.fastq.gz,raw/SAMPLE_01_2.fastq.gz SAMPLE_02,raw/SAMPLE_02_1.fastq.gz,raw/SAMPLE_02_2.fastq.gz
| File | Description |
|---|---|
results/{sample}.CIRI.bed |
CIRIquant calls |
results/{sample}.circexplorer2.bed |
CIRCexplorer2 calls |
results/{sample}.find_circ.bed |
find_circ calls |
results/{sample}.circRNA_finder.bed |
circRNA_finder calls |
results/{sample}.aggr.txt |
Aggregated calls per sample |
results/all_circRNA.tsv.gz |
Combined circRNAs across all samples |
results/circrna_report.html |
HTML report |
results/multiqc_report.html |
QC summary |
FASTQ → fastp → [4 callers in parallel] → aggregate → report
↓
MultiQC
- oxo-flow >= 0.5.0
- Conda or Mamba
- Memory: 32GB recommended (CIRIquant and circRNA_finder need 32GB each)
- Disk: 50GB+ for indices, varies for outputs
If you see may OOM warnings, reduce the memory in config:
[defaults] memory = "24G"
Run the index building commands above. All indices must exist before running the pipeline.
Run ./scripts/setup.sh again or manually create environments:
conda env create -f envs/ciriquant.yaml -n circrna_ciriquant
Apache 2.0