This site needs JavaScript to work properly. Please enable it to take advantage of the complete set of features!
Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

NIH NLM Logo
Log in
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
doi: 10.3390/foods12071498.

A Machine Learning Method to Identify Umami Peptide Sequences by Using Multiplicative LSTM Embedded Features

Affiliations

A Machine Learning Method to Identify Umami Peptide Sequences by Using Multiplicative LSTM Embedded Features

Jici Jiang et al. Foods. .

Abstract

Umami peptides enhance the umami taste of food and have good food processing properties, nutritional value, and numerous potential applications. Wet testing for the identification of umami peptides is a time-consuming and expensive process. Here, we report the iUmami-DRLF that uses a logistic regression (LR) method solely based on the deep learning pre-trained neural network feature extraction method, unified representation (UniRep based on multiplicative LSTM), for feature extraction from the peptide sequences. The findings demonstrate that deep learning representation learning significantly enhanced the capability of models in identifying umami peptides and predictive precision solely based on peptide sequence information. The newly validated taste sequences were also used to test the iUmami-DRLF and other predictors, and the result indicates that the iUmami-DRLF has better robustness and accuracy and remains valid at higher probability thresholds. The iUmami-DRLF method can aid further studies on enhancing the umami flavor of food for satisfying the need for an umami-flavored diet.

Keywords: ANOVA; SMOTE; deep representation learning; light gradient boosting; multiplicative LSTM; mutual information; umami peptide.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Overview of model development. The pre-trained UniRep sequence embedding model was used to embed the peptide sequences into eigenvectors. The peptide sequences were converted into 1900-dimensional (D) UniRep eigenvectors. The synthetic minority over-sampling technique (SMOTE) was used for balancing the imbalanced data. These features were used as inputs to the k-nearest neighbors (KNN), logistic regression (LR), support vector machine (SVM), random forest (RF), and light gradient boosting machine (LGBM) predictor algorithms. Feature extraction was performed for model optimization using analysis of variance (ANOVA), LGBM, and mutual information (MI). The selected feature sets were subjected to another round of analysis using the three feature extraction algorithms and various hyperparameters. The final optimized model was developed by comparison of model performance in 10-fold cross-validation and independent tests. Based on the 91 wet-test validated umami peptide sequences reported in the latest research (UMP-VERIFIED), we evaluated iUmami-DRLF in comparison to state-of-the-art methods.
Figure 2
Figure 2
Results of 10-fold cross-validation (A) and independent testing (B) of the five ML models balanced with SMOTE and the five ML models balanced without SMOTE.As illustrated in Figure 2 and Supplementary Table S1, the features of models following optimization with SMOTE were clearly superior to the features of models developed without SMOTE optimization. Using the LR-based prediction model as an example, the LR-SMOTE model outperformed or equaled the LR model without SMOTE optimization in 66.7% of the metrics in 10-fold cross-validation and independent tests. Of the SVM-based models, the SVM-SMOTE model outperformed the SVM model developed without SMOTE optimization in 83.3% of the indicators.
Figure 3
Figure 3
UMAP was used for visualizing the dimension-reduced features. (A) UniRep features without SMOTE balancing, (B) UniRep features following SMOTE balancing, (C) data of the top 177 features selected from the SMOTE-balanced UniRep feature set, and (D) data obtained using the top 121 features selected from the SMOTE-balanced UniRep feature set.
Figure 4
Figure 4
Comparison of the results of independent testing of the models with selected features and the models without selected features.
Figure 5
Figure 5
Under varying probability thresholds, the prediction results of iUmami-DRLF (this work), UMPred-FRL, and iUP-BERT are shown using the UMP-VERIFIED dataset. (A) is the relationship between prediction accuracy and probability threshold. (B) is the cross-entropy loss of the predicted outcome about the probability threshold. The smaller the cross-entropy loss, the better the robustness and accuracy of the model. Note that at the probability thresholds of 95% and 99%, the prediction accuracy of iUP-BERT and UMPred-FRL is 0, and their corresponding cross-entropy losses can be calculated, but they are not meaningful.

References

    1. Torii K., Uneyama H., Nakamura E. Physiological roles of dietary glutamate signaling via gut–brain axis due to efficient digestion and absorption. J. Gastroenterol. 2013;48:442–451. doi: 10.1007/s00535-013-0778-1. - DOI - PMC - PubMed
    1. Zhang Y., Venkitasamy C., Pan Z., Liu W., Zhao L. Novel Umami Ingredients: Umami Peptides and Their Taste. J. Food Sci. 2017;82:16–23. doi: 10.1111/1750-3841.13576. - DOI - PubMed
    1. Dang Y., Hao L., Cao J., Sun Y., Zeng X., Wu Z., Pan D. Molecular docking and simulation of the synergistic effect between umami peptides, monosodium glutamate and taste receptor T1R1/T1R3. Food Chem. 2019;271:697–706. doi: 10.1016/j.foodchem.201808001. - DOI - PubMed
    1. Minkiewicz P., Iwaniak A., Darewicz M. BIOPEP-UWM Database of Bioactive Peptides: Current Opportunities. Int. J. Mol. Sci. 2019;20:5978. doi: 10.3390/ijms20235978. - DOI - PMC - PubMed
    1. Cao C., Wang J., Kwok D., Cui F., Zhang Z., Zhao D., Li M.J., Zou Q. webTWAS: A resource for disease candidate susceptibility genes identified by transcriptome-wide association study. Nucleic Acids Res. 2022;50:D1123–D1130. doi: 10.1093/nar/gkab957. - DOI - PMC - PubMed

LinkOut - more resources

Cite

AltStyle によって変換されたページ (->オリジナル) /