NONCODE (current version v6.0)
is an integrated knowledge database dedicated to non-coding RNAs (excluding tRNAs and rRNAs).
Now, there are 39 species in NONCODE including 16 animals and 23 plants.
The source of NONCODE includes literature and other public databases.
We searched PubMed using key words ‘ncrna’, ‘noncoding’, ‘non-coding’,‘no code’, ‘non-code’, ‘lncrna’ or ‘lincrna.
We retrieved the new identified lncRNAs and their annotation from the Supplementary Material or web site of these articles.
Together with the newest data from Ensembl, RefSeq, lncRNAdb and GENCODE were processed through a standard pipeline for each species.
The pipeline includes seven steps:
-
Format normalization. All input data were processed into bed or gtf formats based on one assembly version. For example, Tair 10 and Tair 9 are two different assembly versions of A.thaliana. All of the related data were converted into the Tair 10 version.
-
Multi-source data combination. All of the normalized data files were combined using the Cuffcompare program in the Cufflinks suite
-
Protein-coding RNA filtration. We filtered out protein-coding RNA using two methods. First, all RNAs were compared with the coding RNAs in RefSeq and Ensemble. Second, CNIT (Coding-NonCoding Identifying Tool) was used to filter the RNAs and only the RNAs considered noncoding by CNIT were kept.
-
General information presentation. Location, exons, length, assembly sequence, source are listed in each transcript.
-
Expression profiles and functions prediction in plants. Corresponding information in four common plants out of 23 are shown. Their expression profiles were curated from multiple tissues. Detailed data sources were listed in supplementary table 1. Functions for lncRNAs were predicted by co-expression with coding genes.
-
Conservation analysis at transcript level. Plant lncRNA conservation analysis was conducted with BLAST. The E-value cutoff was e-10. Each transcript in a plant species was blasted against every other transcript in the other 22 plant species.
-
Web presence. New web pages especially for plants were constructed in NONCODEV6. More annotation information has been updated.
Now, there are 39 species in NONCODE.
All in all, NONCODE tries to present the most complete collection and annotation of non-coding RNA.
It not only provides the basic information of lncRNA such as location, strand, exon number, length and sequence, but also the advanced information such as the expression profile, exosome expression profile, conservation info, predicted function and disease relation.
The genome version of each species in current NONCODE version
Species Genome Version Abbreviation Phylum
Chimp panTro4 PTR Animal
Gorilla gorGor3 GGO Animal
Opossum monDom5 MDO Animal
Orangutan ponAbe2 PPY Animal
Platypus ornAna1 OAN Animal
Rhesus rheMac3 MML Animal
Human hg38 HAS Animal
Mouse mm10 MMU Animal
C. elegans ce10 CEL Animal
Cow bosTau6 BTA Animal
Chicken garGal4 GGA Animal
Fruitfly dm6 DME Animal
Rat rn6 RNO Animal
Yeast sacCer3 SCE Animal
Zebrafish danRer10 DRE Animal
Pig susScr3 SUS Animal
A. thaliana TAIR10 ATH Plant
B. napus AST_PRJEB5043_v1 BNA Plant
B. rapa IVFCAASv1 BRA Plant
Quinoa ASM168347v1 CQU Plant
C. reinhardtii Chlamydomonas_reinhardtii_v5.5 CRE Plant
Cucumber ASM407v2 CSA Plant
Soybean Glycine_max_v1.0 GMA Plant
G. raimondii Graimondii2_0 GRA Plant
Apple ASM211411v1 MAL Plant
Cassava Manihot_esculenta_v6 MES Plant
M. truncatula MedtrA17_4.0 MTR Plant
Banana MA1 MAC Plant
O. rufipogon OR_W1943 ORU Plant
O. sativa IRGSP-1.0 OSA Plant
P. patens Phypa_V3 PPA Plant
P. trichocarpa JGI2.0 POP Plant
Tomato 390_v2.5 SLY Plant
Potato SolTub_3.0 STU Plant
Cacao Theobroma_cacao_20110822 TCA Plant
Trefoil Trpr TPR Plant
Wheat IWGSC TAE Plant
Grape IGGP_12x VVI Plant
Maize AGPv4 ZMA Plant