File Formats for Illumina Sequencing
Numerous options are available for converting data to compatible sequence file formats such as FASTQ files, and for downstream analysis of sequencing data. Illumina sequencers are designed so data can be easily streamed into Illumina Connected Analytics and BaseSpace Sequence Hub for cloud-based data management, analysis, and collaboration.
Raw data files are provided in sequence file formats that are compatible, or easily converted, to standardized data formats for streamlined aggregation and mining of large cohorts. With the DRAGEN BioIT platform, the newest file format, FASTQ.ORA, is available. FASTQ.ORA is a lossless compression file reducing the size, time to transfer, and storage cost.
FASTQ Sequence File Format
FASTQ is a text-based sequencing data file format that stores both raw sequence data and quality scores. FASTQ files have become the standard format for storing NGS data from Illumina sequencing systems, and can be used as input for a wide variety of secondary data analysis solutions.
The MiniSeq and MiSeq Sequencing Systems provide the option to automatically convert data from BCL to FASTQ format, so separate conversion software is not required.
FASTQ ORA Sequence File Format
FASTQ ORA is a binary compressed file format of the text-based FASTQ sequencing data file format. fastq.ora files are up to 5x smaller than their corresponding fastq.gz files without compromising data integrity. All fastq.ora files can be read using the free decompression software available here. Once installed, a simple command can be used to directly pipe the output of decompression on the fly into a wide range of popular mapping tools such as BWA,1 STAR,2 and Bowtie.3 DRAGEN ORA compression is available with the DRAGEN server and on-board the NextSeq1000/2000.
Lossless Compression Cuts Costs and Time
Illumina'a DRAGEN ORA (Original Read Archive) files are compressed to one-fifth the size, significantly reducing the time and cost for compute storage.
BCL Sequence File Format
The binary base call (BCL) sequence file format requires conversion to FASTQ format for use with user-developed or third-party data analysis tools. The NextSeq and HiSeq Sequencing Systems and NovaSeq 6000 generate raw data files in BCL format.
The DRAGEN Bio-IT Platform offers rapid BCL conversion to FASTQ files as part of its suite of pipelines.
Illumina also offers stand alone BCL Convert Software to convert BCL files to FASTQ files. BCL Convert is a standalone conversion software solution that demultiplexes data and converts BCL files to standard FASTQ file formats for downstream analysis.
Other Sequence File Formats
FASTQ files are the typical starting format for sequencing data analysis. However, BaseSpace Sequence Hub can create other file formats that are common to secondary and tertiary analysis programs.
During secondary or tertiary analysis of NGS data, software platforms and apps in the Illumina informatics platforms will often convert raw sequence files from FASTQ files to other sequence file formats (ie, .vcf, .bam) as part of the analysis workflow.
Interested in receiving newsletters, case studies, and information on genomic analysis techniques?
Enter your email address.
Additional Resources
Developer Portal
Access user guides, release notes, and additional technical information.
Online Training
These free online courses cover common topics in library prep, sequencing, and data analysis.
Illumina DRAGEN Bio-IT Platform Training
Learn more about the accurate, ultra-rapid secondary analysis platform and accompanying pipelines.
Enterprise-Level Protection
To meet the most stringent security requirements, the Illumina Connected Analytics Platform was built with security and compliance at its core.
References
- Li H. and Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009 Jul 15; 25(14): 1754–1760.
- Dobin A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013 Jan; 29(1): 15–21.
- Langmead B. et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 2009 10:R25