Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A post-processing tool to reclassify Kraken 2 output based on the confidence score and/or minimum minimizer hit groups.

License

Notifications You must be signed in to change notification settings

danisven/StringMeUp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

73 Commits

Repository files navigation

Anaconda-Server Badge

StringMeUp

A post-processing tool for Kraken 2 read classifications. Run Kraken 2 once and re-classify the reads with any confidence score stringency of your choice afterwards, saving you lots of compute time. Creates Kraken 2 style report and read classification files.

For additional insight into your Kraken 2 classifications, try out KrakMeOpen - a downstream analysis toolkit for Kraken 2 classification quality metrics.

Installation

StringMeUp is available to install through conda. Simply run the following command to install it:

conda install -c conda-forge -c bioconda stringmeup

Usage

A good start is to run stringmeup --help.

About the confidence score

The confidence score (CS) for a given read R classified to a given node J is calculated by dividing the number of k-mers that hit any node in the clade rooted at node J (N) by the total number of k-mers that were queried against the database (M). Any k-mer with an ambiguous nucleotide is not queried against the database, and is thus not part of M.

CS = N / M

If the CS for a given read R at a given node J is equal to or larger than the specified cutoff, read R is classified to node J. If not, the CS of read R is calculated for the parent of node J. This is repeated until the CS >= CS cutoff or until we reach the root of the taxonomy. If the CS < CS cutoff at the root, the read is deemed unclassified.

Reclassifying Kraken 2 output

To reclassify reads classified by Kraken 2 with a confidence cutoff of 0.1:

stringmeup --names <names.dmp> --nodes <nodes.dmp> 0.1 <original_classifications.kraken2>

Where:

  • original_classifications.kraken2 is the output file from Kraken 2 that contain the read-by-read classifications.
  • names.dmp and nodes.dmp are the same NCBI taxonomy files used for the building of the database that was used to produce the classifications in original_classifications.kraken2.

This command would output a Kraken 2 style report to stdout. Adding --output_report <FILE> would save the report in a file.

To save the read-by-read classifications, add --output_classifications <FILE> to the command.

To save a verbose version of the read-by-read classifications, add --output_verbose <FILE> to the command. The verbose version of the read-by-read classifications will contain the following columns:

Column Explanation
READ_ID The ID of the read
READ_LENGTH The length of the read (same as Kraken 2 output)
MINIMIZER_HIT_GROUPS* The number of minimizer hit groups found during Kraken 2 classification*
TAX_LVL_MOVES How many levels in the taxonomy that the read moved during reclassification
ORIGINAL_TAXID The taxID that the read was classified to originally
NEW_TAXID The taxID that the read was reclassified to
ORIGINAL_CONFIDENCE The original confidence score
NEW_CONFIDENCE The confidence score at the taxID that the read was reclassified to
MAX_CONFIDENCE The maximum confidence that the read can have
ORIGINAL_TAX_LVL The taxonomic rank of the orignally classified taxID
NEW_TAX_LVL The taxonomic rank of the reclassified taxID
ORIGINAL_NAME The scientific name of the original taxID
NEW_NAME The scientific name of the reclassified taxID
KMER_STRING The k-mer string (same as Kraken 2 output)

*: Is only present if the forked version of Kraken 2 was used for initial classification.

Reclassifying with minimum hit groups

This option requires an input file that was produced with my fork of Kraken 2.

Add --minimum_hit_groups <INT> to the command. A read can only be considered classified if the number of minimizer hit groups is at or above the minimum_hit_groups setting.

About

A post-processing tool to reclassify Kraken 2 output based on the confidence score and/or minimum minimizer hit groups.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

Languages

AltStyle によって変換されたページ (->オリジナル) /