This site needs JavaScript to work properly. Please enable it to take advantage of the complete set of features!
Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

NIH NLM Logo
Log in
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
doi: 10.7717/peerj.6830. eCollection 2019.

FunPred 3.0: improved protein function prediction using protein interaction network

Affiliations

FunPred 3.0: improved protein function prediction using protein interaction network

Sovan Saha et al. PeerJ. .

Abstract

Proteins are the most versatile macromolecules in living systems and perform crucial biological functions. In the advent of the post-genomic era, the next generation sequencing is done routinely at the population scale for a variety of species. The challenging problem is to massively determine the functions of proteins that are yet not characterized by detailed experimental studies. Identification of protein functions experimentally is a laborious and time-consuming task involving many resources. We therefore propose the automated protein function prediction methodology using in silico algorithms trained on carefully curated experimental datasets. We present the improved protein function prediction tool FunPred 3.0, an extended version of our previous methodology FunPred 2, which exploits neighborhood properties in protein-protein interaction network (PPIN) and physicochemical properties of amino acids. Our method is validated using the available functional annotations in the PPIN network of Saccharomyces cerevisiae in the latest Munich information center for protein (MIPS) dataset. The PPIN data of S. cerevisiae in MIPS dataset includes 4,554 unique proteins in 13,528 protein-protein interactions after the elimination of the self-replicating and the self-interacting protein pairs. Using the developed FunPred 3.0 tool, we are able to achieve the mean precision, the recall and the F-score values of 0.55, 0.82 and 0.66, respectively. FunPred 3.0 is then used to predict the functions of unpredicted protein pairs (incomplete and missing functional annotations) in MIPS dataset of S. cerevisiae. The method is also capable of predicting the subcellular localization of proteins along with its corresponding functions. The code and the complete prediction results are available freely at: https://github.com/SovanSaha/FunPred-3.0.git.

Keywords: MIPS Database; Neighborhood approach; Physico-chemical properties; Protein function prediction; Protein interaction networks; Protein–protein interactions.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1. Filtering of PPIN.
Application of node weight and edge weight at three levels of threshold: High, Medium and Low in FunPred 3.0_Clust.
Figure 2
Figure 2. Cluster formations.
Formation of clusters from refined network after application of three levels of node and edge weight threshold in FunPred 3.0_Clust.
Figure 3
Figure 3. FunPred 3.0_Pred.
Working Model of FunPred 3.0_Pred. A: Selected test protein B: Formation of PPIN of test protein C: Formation of clusters D: Computation of distance of the test protein from each of the formed cluster E: Allocation of test protein to the selected cluster having minimum distance along with all it’s functions.
Figure 4
Figure 4. Categorization of proteins based on subcellular localization.
PPIN of yeast (Saccharomyces cerevisiae): cytoplasm proteins (red), nuclear proteins (green), interface proteins (blue), unpredicted localization proteins (orange).
Figure 5
Figure 5. Network view of PPIN of yeast.
Sequential formation of cytoplasm proteins (red), nuclear proteins (green), interface proteins (blue), unpredicted localization proteins (orange) in PPIN of yeast.
Figure 6
Figure 6. Disintegrated network views of PPIN of yeast.
Separate PPIN’s of cytoplasm proteins (red), nuclear proteins (green), interface proteins (blue), unpredicted localization proteins (orange) and their interactions.
Figure 7
Figure 7. Nuclear PPIN of yeast.
Candidate (green) and test (yellow) proteins in nuclear PPIN (green and yellow) of yeast (violet: other nodes in the network).
Figure 8
Figure 8. Cytoplasm PPIN of yeast.
Candidate (red) and test (yellow) proteins in cytoplasm PPIN (red and yellow) of yeast (violet: other nodes in the network).
Figure 9
Figure 9. Interface PPIN of yeast.
Candidate (blue) and test (yellow) proteins in Interface PPIN (blue and yellow) of yeast (violet: other nodes in the network).
Figure 10
Figure 10. Network view.
PPI network of Yeast (Saccharomyces cerevisiae).
Figure 11
Figure 11. Selected candidate and test proteins.
PPIN of annotated (red circle) and test/unannotated proteins (yellow circle) of the yeast network (Saccharomyces cerevisiae).

References

    1. Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O’Donovan C, Redaschi N, Yeh L-SL. UniProt: the Universal Protein knowledgebase. Nucleic Acids Research. 2004;32(90001):D115–D119. doi: 10.1093/nar/gkh131. - DOI - PMC - PubMed
    1. Bjellqvist B, Basse B, Olsen E, Celis JE. Reference points for comparisons of two-dimensional maps of proteins from different human cell types defined in a pH scale where isoelectric points correlate with polypeptide compositions. Electrophoresis. 1994;15(1):529–539. doi: 10.1002/elps.1150150171. - DOI - PubMed
    1. Breiman L. Random forests. Machine Learning. 2001;45(1):5–32. doi: 10.1023/a:1010933404324. - DOI
    1. Chakicherla A, Ecale Zhou CL, Dang ML, Rodriguez V, Hansen JN, Zemla A. SpaK/SpaR Two-component System Characterized by a Structure-driven Domain-fusion Method and in Vitro Phosphorylation Studies. PLOS Computational Biology. 2009;5(6):e1000401. doi: 10.1371/journal.pcbi.1000401. - DOI - PMC - PubMed
    1. Chatterjee P, Basu S, Kundu M, Nasipuri M, Plewczynski D. PPI_SVM: prediction of protein-protein interactions using machine learning, domain-domain affinities and frequency tables. Cellular and Molecular Biology Letters. 2011a;16(2):264–278. doi: 10.2478/s11658-011-0008-x. - DOI - PMC - PubMed

LinkOut - more resources

Cite

AltStyle によって変換されたページ (->オリジナル) /