For many years we have iterated our technology to improve performance. We continue to do so through updates to analytical methods and chemistries. We continue to improve the nanopore sensing system, through updates to analytical methods and new chemistries. This page guides you on what to expect from the nanopore sequencing system, and which tools to choose to achieve these results.
What is sequencing accuracy?
Accuracy is a generic term that might refer to different aspects of DNA and RNA sequencing performance. Typically, it refers to the accuracy at a single read level or at the consensus level, combining the information from multiple reads of a DNA/RNA region into a single high-quality sequence. Depending on the application, other relevant factors to consider are the proportion of the genome covered and the ability to detect epigenetic modifications. Usually, genomic research focuses either on resequencing and mapping to a reference genome or reconstructing unknown genomes through de novo assembly, assembly. For mapping-based projects, changes compared with the reference sequences are used for inference, hence variant calling becomes the main focus. For de novo quality is estimated by the accuracy of the reconstructed sequence and other metrics such as N50.
Variant calling accuracy
Variant calling identifies differences from a reference sequence and is crucial in understanding how genotypes drive phenotypes. Oxford Nanopore technology can sequence any length of DNA and RNA molecule, offering unprecedented resolution of complex structural variants and efficient haplotype phasing of variants.
Measuring the accuracy of variant calling is critical to ensure that the genetic variants identified are biological differences and not artefacts. Accuracy is commonly measured with the so called F1 score, the harmonic mean of precision (proportion of called variants that are actually variants) and sensitivity or recall (proportion of all variants that are correctly called). This metric is especially useful when you want to balance the trade-off between identifying as many variants as possible (high sensitivity) and ensuring the variants identified are truly variants (high precision).
Our SNP calls produced with [nanopore sequencing] ... were comparable to state-of-the-art short-read-based methods
Kolmogorov et al. Nat. Methods (2023)
[With nanopore Q20 chemistry] It is now realistic to use long read sequencers to systematically analyze a wider range of cancerous mutations
Sakamoto et al. Nat. Commun. (2022)
Figure 1. Accuracy data obtained from a dataset of 29kb N50 prepared with the Ligation Sequencing Kit V14 (with enzyme E8.2.1) and PromethION R10.4.1 Flow Cell. Accuracy is measured as F1 score for variant calling, using nanopore sequencing data for the human genome (HG002 cell lines) at several read depths. Variant calling was performed with wf-human-variation workflow version v2.2.6, and variants were compared against the Genome In A Bottle consortium’s HG002 truth-set (SNVs and Indels v4.2.1, SVs v0.6) . SNVs and indels (< 50bp) are represented with dark colours, while indels (< 50bp) in coding regions (CDS) are displayed with lighter colours. F1 score for Dorado basecalling models of A) high accuracy (HAC, v5.2.0) and B) super accuracy (SUP, v5.2.0).
Base modification accuracy
The four DNA bases (A, C, G, T) and RNA bases (A, C, G, U) can undergo biological modifications like methylation, impacting gene expression and contributing to diseases such as cancer. Oxford Nanopore’s technology allows for direct, real-time sequencing and detection of these modifications for both DNA and RNA (e.g. 5mC, 5hmC, 6mA, 4mC for DNA, m6A, and pseudoU for RNA) without additional experiments or preparation, unlike legacy methods, such as bisulphite sequencing, that have several limitations.
Read more about direct DNA and RNA base modifications detection
Figure 2. Bisulfite sequencing data. Basecalling of 5mC on synthetic strands with known composition is extremely accurate with precision, recall, and F1 score all above 99%. Oxford Nanopore data for the human sample HG002 shows much higher confidence CpGs (>90%) at a much lower depth than bisufite whole-genome sequencing (WGS). All data reported in this figure was generated with Ligation Sequencing Kit V14 and PromethION R10.4.1 Flow Cells using SUP basecalling models.
Our methylation calls [with nanopore sequencing] were highly concordant with the standard bisulfite sequencing, but in addition had haplotype-specific resolution
Kolmogorov et al. Nat. Methods (2023)
| Molecule | Modification | Molecular context | Raw read accuracy (SUP) |
|---|---|---|---|
| DNA | 5mC | CpG | 99.5% |
| 5mC | All | 99.4% | |
| 5mC/5hmC | CpG | 99.2% | |
| 5mC/5hmC | All | 98.7% | |
| 6mA | All | 99.7% | |
| 4mC/5mC | All | 97.6% | |
| RNA | m6A | DRACH | 99.7% |
| m6A | All | 98.7% | |
| pseU | All | 97.6% | |
| m5C | All | 97.9% | |
| Inosine | All | 98.8% | |
| 2’OMe-A | All | 99.2% | |
| 2’OMe-C | All | 98.7% | |
| 2’OMe-G | All | 98.2% | |
| 2’OMe-U | All | 96.7% |
Table 1. Currently supported models for DNA and RNA modification basecalling available in Dorado standalone in GitHub. Accuracy values were generated on a synthetic truth-set using v5.2 SUP (for DNA) and v5.2 SUP (for RNA) basecalling models. All modification models, except RNA 2’OMe are currently available in the latest MinKNOW version. 2'OMe models will be integrated in later MinKNOW versions.
Assembly accuracy
Assembly accuracy refers to the degree to which a reconstructed sequence of DNA or RNA matches the true biological sequence from which it was derived. This involves building a consensus sequence from multiple DNA/RNA reads, enhancing accuracy and creating a reliable sequence for further analysis.
Find out more about assembly & whole-genome sequencing.