About KEGG Syntax
KEGG Syntax (
Synteny and
taxonomy) is a new resource for using KOs (K numbers) and modules (M numbers), as well as computationally generated ortholog groups such as VOGs, in the taxonomy-based analysis of conserved genes, conserved gene sets and conserved gene orders (conserved synteny) in cellular organisms and viruses.
An example of using KEGG Mapper and KEGG Syntax
- Use BlastKOALA or GhostKOALA to assign KOs to the gene set in the user's genome.
- Use KEGG Mapper Reconstruct tool to obtain functional implications from reconstucted pathways, etc.
- KO assignment and functional inference may be improved by using KEGG Syntax tools, especially when the gene set is ordered according to the chromosomal position.
Conserved genes
Manually defined KOs
In KEGG conserved genes are represented by
KOs (KEGG Orthology groups) for both cellular organisms and viruses. KOs are manually defined from functionally characterized genes and proteins in specific organisms, but they are expanded to other organisms based on sequence similarity.
Computationally generated ortholog groups
For selected sets of proteins, ortholog grouping is computationally performed using the SSEARCH comparison results in order to verify, improve and expand manually defined KO grouping. For viruses,
VOGs (virus ortholog groups) are made available.
Conserved gene sets
Manually defined KO sets (modules)
KEGG modules are conserved functional units of genes. They are manually defined sets of KOs that are involved, for example, in successive reactions steps in conserved subpathways of metabolic pathways. They sometimes contain positionally correlated genes, called gene clusters, as in operon structures of prokaryotic genomes.
Computationally generated KO clusters
Here "KO clusters" represent conserved gene sets along chromosomal positions, implying conserved synteny. As the result of KO assignment to genes in the genome, similar gene orders can be found by a sequence alignment method by considering the genome as a sequence of KO identifiers. A new
genome comparison method has been developed using the Goad-Kanehisa algorithm, enabling a comprehensive analysis of syntenic regions.
Computationally generated VOG clusters
Furthermore, this algorithm can be used to analyze VOG clusters by considring the genome as a sequence of VOG identifiers for better understanding of gene set transfer between viruses and cellular organisms.
Taxonomy files
The KEGG database uses the
NCBI taxonomy for classification of cellular organisms and viruses. For cellular organisms, the three- or four-letter KEGG organism codes are classified somewhat differently in the following Brite hierarchy files.
08601 is a manually created taxonomy file using the simple hierarchy defined in the
KEGG organism groups and the predefined order of organism codes with hsa (Homo sapiens) at the top.
08610 is computationally generated using the abbreviated lineage of the NCBI taxonomy keeping the order of organism codes defined in 08601. In addition, 08610 contains taxonomy IDs for GENES Addendum (ag) entries. 08611 is another computationally generated file for the KEGG organisms with fixed levels of taxonomic ranks: phylum, class, order, family, genus and species.
For viruses, the taxonomy IDs of KEGG Viruses (GENOME vtax category and GENES vg category) are classified according to the NCBI taxonomy, which is based on the
ICTV taxonomy, with the Baltimore classification
added by KEGG.
Both of these Brite hierarchy files are computationally generated and the lowest-level taxonomy IDs are linked to GENOME vtax entries. In the 08620 file the taxonomy IDs are shown in the full lineage of NCBI virus taxonomy, while the 08621 file is organized in the fixed levels of taxonomic ranks: realm, kingdom, phylum, class, order, family, genus and species.
Last updated: December 1, 2025