A fast, parallel Rust implementation of Prodigal, a tool for finding protein-coding genes in microbial genomes.
Orphos is a high-performance reimplementation of Prodigal, the widely-used prokaryotic gene prediction tool. Written in Rust, Orphos delivers the same accurate gene finding algorithm with improved performance and modern language features.
- Faster Performance: Multi-threaded processing using Rayon for parallel genome analysis
- Memory Efficient: Optimized memory usage for handling large genomes and metagenomes
- Browser Support: Unique WebAssembly build - runs in web browsers
- 100% Compatible: Output formats fully compatible with original Prodigal (GFF3, GenBank, etc.)
- Modern Codebase: Written in safe Rust with excellent error handling
- Multiple Interfaces: CLI, Rust library, Python bindings, and WebAssembly
- Easy Installation: Available via Cargo, pip, Homebrew, and Conda
Orphos is available in multiple forms:
orphos-cli: Command-line interface for gene predictionorphos-core: Rust library for integrating into your own projectsorphos-python: Python bindings (via PyO3)orphos-wasm: WebAssembly module for browser/Node.js usage
- High Performance: Multi-threaded processing using Rayon
- Memory Efficient: Optimized memory usage for large genomes
- Compatible: Output format compatible with original Prodigal
- Cross-Platform: Works on Linux, macOS, and Windows
brew install FullHuman/tap/orphos
cargo install orphos-cli
git clone https://github.com/FullHuman/orphos.git
cd orphos
cargo install --path orphos-clipip install orphos
Add to your Cargo.toml:
[dependencies] orphos-core = "0.1.0"
# Basic gene prediction orphos -i input.fasta -o output.gbk # Output in GFF format orphos -i input.fasta -f gff -o output.gff # Use custom training file orphos -i input.fasta -t training.trn -o output.gbk # Metagenomic mode orphos -i metagenome.fasta -p meta -o output.gff
import orphos # Analyze a FASTA file result = orphos.analyze_file("genome.fasta") print(f"Found {result.gene_count} genes") print(result.output) # GenBank formatted output # Analyze a sequence string fasta_string = """>seq1 ATGCGATCGATCGATCGATCG... """ result = orphos.analyze_sequence(fasta_string) # Customize options options = orphos.OrphosOptions( mode="meta", # Use metagenomic mode format="gff", # Output in GFF format closed_ends=True, # Don't allow genes off edges translation_table=11 # Use translation table 11 ) result = orphos.analyze_file("genome.fasta", options)
use orphos_core::{OrphosAnalyzer, config::OrphosConfig}; fn main() -> Result<(), Box<dyn std::error::Error>> { // Create analyzer with default configuration let mut analyzer = OrphosAnalyzer::new(OrphosConfig::default()); // Analyze a genome sequence let results = analyzer.analyze_sequence( "ATGCGATCGATCG...", Some("MyGenome".to_string()) )?; println!("Found {} genes", results.genes.len()); Ok(()) }
For more advanced usage with type-safe training:
use orphos_core::engine::{UntrainedOrphos, Orphos, Untrained}; use orphos_core::config::OrphosConfig; use orphos_core::sequence::encoded::EncodedSequence; // Create an untrained analyzer let mut untrained = UntrainedOrphos::with_config(OrphosConfig::default())?; // Encode the sequence let encoded = EncodedSequence::without_masking(b"ATGCGATCGATCG..."); // Train on the genome (type changes to TrainedOrphos) let trained = untrained.train_single_genome(&encoded)?;
// TODO: Add documentation links
Orphos supports multiple output formats:
- GenBank (GBK): Rich feature annotation format (default)
- GFF3: General Feature Format version 3
- GCA: Gene coordinate annotation
- SCO: Simple coordinate output
- Single Genome Mode: Train on a complete genome for optimal gene prediction (default)
- Metagenomic Mode: Predict genes in fragmented or mixed sequences
- Parallel Processing: Multi-threaded execution using Rayon
- Memory Efficient: Optimized memory usage for large genomes
- High Performance: Significantly faster than the original C implementation
# Run all tests cargo test # Run with coverage cargo install cargo-tarpaulin cargo cov-fast # Run benchmarks cargo bench
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
This project is licensed under the GPL-3.0 License - see the LICENSE file for details.
This project is based on the original Prodigal by Doug Hyatt (license).