Introduction
The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
Next-Generation sequencing machines usually produce FASTA or FASTQ files, containing multiple short-reads sequences (possibly with quality information).
The main processing of such FASTA/FASTQ files is mapping (aka aligning)
the sequences to reference genomes or other databases using specialized
programs. Example of such mapping programs are:
Blat,
SHRiMP,
LastZ,
MAQ
and many many others.
However,
It is sometimes more productive to preprocess the FASTA/FASTQ files before
mapping the sequences to the genome - manipulating the sequences to
produce better mapping results.
The FASTX-Toolkit tools perform some of these preprocessing tasks.
Available Tools
-
FASTQ-to-FASTA converter
Convert FASTQ files to FASTA files.
-
FASTQ Information
Chart Quality Statistics and Nucleotide Distribution
-
FASTQ/A Collapser
Collapsing identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts)
-
FASTQ/A Trimmer
Shortening reads in a FASTQ or FASTQ files (removing barcodes or noise).
-
FASTQ/A Renamer
Renames the sequence identifiers in FASTQ/A file.
-
FASTQ/A Clipper
Removing sequencing adapters / linkers
-
FASTQ/A Reverse-Complement
Producing the Reverse-complement of each sequence in a FASTQ/FASTA file.
-
FASTQ/A Barcode splitter
Splitting a FASTQ/FASTA files containning multiple samples
-
FASTA Formatter
changes the width of sequences line in a FASTA file
-
FASTA Nucleotide Changer
Convets FASTA sequences from/to RNA/DNA
-
FASTQ Quality Filter
Filters sequences based on quality
-
FASTQ Quality Trimmer
Trims (cuts) sequences based on quality
-
FASTQ Masker
Masks nucleotides with 'N' (or other character) based on quality
These tools can be used in two forms:
-
Web-based (with Galaxy).
Galaxy's Test website already contains some of the FASTX-toolkit tools.
-
Command-line:
running the tools from command line (or as part of a script).
Tools demonstration
Visit the
Hannon lab public galaxy server to see a demonstration of these (and other) tools.
News
02-Feb-2010 - Version 0.0.13
New tools:
fastq_masker (suggested by Ben Bimber)
New features:
fastx_trimmer can trim N nucleotides from the end of the sequences (a new command line option, and a separate tool in Galaxy)
fastx_clipper accepts minimum adapter length to clip (requested by Erick Antezana, command line only)
Improved Galaxy integration:
Almost all tools have working functional tests (except the plotting tools and barcode splitter).
Plotting tools (nucleotide distribution and quality boxplot) detect input file type and show a detailed warning if given a FASTA/Q file as input
(hopefully reducing bug reports).
Tools read the input FASTQ type (sanger or solexa) and use the correct quality ASCII offset (33 for sanger, 64 for solexa).
Dec-2009 - Version 0.0.12
never officially released
24-Nov-2009 - Version 0.0.11
New tools: fastx_uncollapser, fastq_quality_filter.
New features: fastx_collapser can re-collapse an already-collapsed FASTA file; fastx_trimmer can trim N bases from the end of the sequence.
Minor compilation bug-fixes.
10-Aug-2009 - Version 0.0.10
Bug fix on Mac OS X (reported by Joshua Waterfall).
New tool: FASTX-Renamer (based on suggestion+patch by Charles Plessy).
New undocumented command line argument:
-Q NN handles FASTQ ASCII quality with user specified offset (was hard-coded as 64 in previous versions). Requested by Erick Antezana
Barcode-Splitter: improved galaxy integration - stores output files directly into galaxy's files database; no need for external webserver anymore.
Uses
libgtextutils-0.5 library (as a dynamic library)
Version 0.0.9
Never released.
12-Mar-2009 - Version 0.0.8
Minor changes to compilation stage, as suggested by users.
FASTX-toolkit should now compile cleanly on Mac OS x.
No new features were added.
Using
libgtextutils-0.3 library.
24-Mar-2009 - Version 0.0.7
Added
Fasta-Formatter and
Fasta-Nucleotide-Changer tools.
Using
libgtextutils-0.1 library.
25-Feb-2009 - Version 0.0.6
Initial public release.