Abstract:Gene Ontology (GO) annotations are predicted for CGD proteins, based on protein domains and motifs. These predictions are made in CGD on a periodic basis to keep up-to-date with additions to the mapping file and refinements of the gene annotation set. The procedure used is the following: First, we make
orthology-based GO term predictions. Subsequently, we make the domain-based predictions. The domain assignments are determined using the
InterProScan software from the European Bioinformatics Institute (
EBI). These protein domain data in CGD may be downloaded in bulk from the
domains directory on our web site. The InterPro-to-GO mappings are acquired from the interpro2go file, which is downloaded from the
GO Consortium web site. Each protein in CGD is evaluated for the presence of InterPro domains, and any GO terms associated with the domain are then considered candidates for assignment. A new domain-based GO prediction is NOT made if it is redundant with (or a parental term of) an existing manually-curated or orthology-based GO term that is already assigned to the gene product. All of these new domain-based GO predictions are assigned in CGD as Computational Predictions with "Inferred From Electronic Annotation" (
IEA) evidence.