This site needs JavaScript to work properly. Please enable it to take advantage of the complete set of features!
Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

NIH NLM Logo
Log in
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec;89(12):1959-1976.
doi: 10.1002/prot.26246. Epub 2021 Oct 19.

Assessment of prediction methods for protein structures determined by NMR in CASP14: Impact of AlphaFold2

Affiliations

Assessment of prediction methods for protein structures determined by NMR in CASP14: Impact of AlphaFold2

Yuanpeng Janet Huang et al. Proteins. 2021 Dec.

Abstract

NMR studies can provide unique information about protein conformations in solution. In CASP14, three reference structures provided by solution NMR methods were available (T1027, T1029, and T1055), as well as a fourth data set of NMR-derived contacts for an integral membrane protein (T1088). For the three targets with NMR-based structures, the best prediction results ranged from very good (GDT_TS = 0.90, for T1055) to poor (GDT_TS = 0.47, for T1029). We explored the basis of these results by comparing all CASP14 prediction models against experimental NMR data. For T1027, NMR data reveal extensive internal dynamics, presenting a unique challenge for protein structure prediction methods. The analysis of T1029 motivated exploration of a novel method of "inverse structure determination," in which an AlphaFold2 model was used to guide NMR data analysis. NMR data provided to CASP predictor groups for target T1088, a 238-residue integral membrane porin, was also used to assess several NMR-assisted prediction methods. Most groups involved in this exercise generated similar beta-barrel models, with good agreement with the experimental data. However, as was also observed in CASP13, some pure prediction groups that did not use any NMR data generated models for T1088 that better fit the NMR data than the models generated using these experimental data. These results demonstrate the remarkable power of modern methods to predict structures of proteins with accuracies rivaling solution NMR structures, and that it is now possible to reliably use prediction models to guide and complement experimental NMR data analysis.

Keywords: MipA; integral membrane proteins; machine leaning; protein dynamics; protein structure prediction; solution NMR; structure determination.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there is no conflict of interest. G.T.M. is a founder of Nexomics Biosciences, Inc. and G.L. is Chief Scientific Officer of Nexomics, Biosciences, Inc. These roles do not represent a conflict of interest for this study.

Figures

FIGURE 1
FIGURE 1
DP scores in CASP14. Schematic description of RPF‐DP scores. In this analysis, the graph G with nodes corresponding to all assigned 1H's and edges representing all short (<5 Å) 1H–1H distances in a structure model (left), is compared with a graph GANOE (right), in which nodes again correspond to all assigned 1H's and edges describe all possible assignments for each NOESY cross peak. TPs are edges common to both G and GANOE, false positives (FPs) are edges present in G but not in GANOE, and false negatives (FNs) are the set of edges in GANOE representing the multiple possible assignments of a NOESY cross peak, none of which are present in G. These metrics are used to compute recall (R), precision (P), and F‐measure as shown in the figure and outlined in the Methods Section. The F‐measure is the harmonic mean of the recall and precision. The Discriminating Power (DP) is a normalized F‐measure corrected to account for the F‐measure expected for a random‐coil chain (DP = 0) and the best F‐measure possible considering the completeness of the NMR data (DP = 1.0). Accurate structures generally have DP for individual models > 0.60
FIGURE 2
FIGURE 2
Structural analysis for CASP14 targets 1055 and 1027. (left) Superimposed ensembles for (A) NMR structure (PDB ID 6zyc) (green) and (B) AF2 structures (blue) of T1055, illustrating the not‐well‐defined segments (brown) as defined by Cyrange. For the NMR structure, residues 305–426 are well‐defined, while for the AF2 structure residues 310–428 are well‐defined (residues 427 and 428 being part of the linker to the purification tag). (C,D) Comparison of AF2 conformer with highest GDT score (blue) with the representative conformer from the NMR structure ensemble with best DP score, for residues 310–426 of T1055. The well‐defined backbone (N, Cα, C′) atoms are superimposed and both the backbone superimposition and associated core sidechains are illustrated. DP versus GDT scores (E) and DP scores versus predictor group (F) for target T1055. (right) Superimposed ensembles for (G) NMR structure (PDB ID 7d2o) (green) and (H) AF2 structure (blue) of T1027, illustrating the not‐well‐defined segments (brown). For the NMR structure, residues 10–18, 36–81, and 96–145 are well‐defined, while for the AF2 structure residues 36–75 and 96–164 are well‐defined. (I,J) Comparison of AF2 conformer with highest GDT score (blue) with the conformer from the NMR structure ensemble with highest DP score, for T1027. The well‐defined backbone (N, Cα, C′) atoms are superimposed for residue ranges 36–75 and 96–145. In the NMR structure, the N‐terminal helix (α1) sits in a pocket in the core of the protein, while the C‐terminal region is disordered (and therefore not shown in panel I); while in the AF2 structure, the N‐terminal region is disordered (and not shown in panel I), and the C‐terminal region forms a C‐terminal helix that packs into the core of the protein structure. The five disulfide bonds of T1027 are illustrated in panel J. DP versus GDT scores (K) and DP scores versus predictor group (L). The red horizontal lines in (E) and (K) are drawn at the DP scores of the best scoring conformation from the ensemble of experimental structures. For both targets, only residues that are well‐defined in both the NMR or AF2 structures were included in superimposition and GDT score calculations. The nine helices of the NMR model, as well as the C‐terminal helix of the AF2 model, are labeled in panel I
FIGURE 3
FIGURE 3
RPF and Talos_N analysis for CASP14 targets 1055 and 1027. (left) Ensemble Recall analysis for the NMR structure (A) and AF2 model (B) of T1055. Residues with a few NOEs that are assigned and satisfied in the NMR model, but with recall violations for the AF2 models, are colored in light blue in the AF2 model. (C) Plot of number of NOEs that are satisfied in NMR structures but not in AF2 models (blue), or satisfied in AF2 models but not in NMR structures (orange), are plot along the sequence; most NOEs can be explained by both structures. (D,E) Precision analysis for the NMR structure ensemble and AF2 model ensemble of T1055. Residues with modest numbers of Precision violations are colored light blue or green, and those with significant numbers of precision violations are colored yellow, orange and red. (F,G) Talos_N analysis for the ensembles of NMR structures and AF2 models of T1055. No significant violations of dihedral angle restraints derived from backbone chemical shift data are observed in any of the NMR structures or AF2 models. (right) Ensemble recall analysis for the NMR structure (H) and AF2 model (I) of T1027. Residues with a NOEs that are assigned and satisfied in the NMR model, but with recall violations for the AF2 models, are colored as outlined in the text on the AF2 model, and vice versa. (J) Plot of number of NOEs that are satisfied in NMR structures but not in AF2 models (blue), or satisfied in AF2 models but not in NMR structures (orange), along the sequence; many NOEs can be explained only by the NMR models. (K,L). Precision analysis for the NMR structure ensemble and AF2 model ensemble of T1027. Residues with modest numbers of Precision violations are colored light blue or green, and those with significant numbers of precision violations are colored yellow, orange and red. (M,N) Talos_N analysis for the ensembles of NMR structures and AF2 models of T1027. Residues colored yellow are indicated by chemical shift data to be flexible; residues colored red have backbone conformations in well‐defined regions of the models that are inconsistent with the chemical shift data. In all images, the dark blue color indicates little or no metric violation. In mapping precision violations on the models (e.g., panels K and L) the regions of the structure that are not converged are not shown because precision violations in these regions can arise simply from the conformational variability
FIGURE 4
FIGURE 4
Structural analysis for CASP14 target 1029. Superimposed ensembles for (A) NMR structure (PDB ID 6uf2) (green) and (B) AF2 structure (blue) of T1029. For the NMR structure, residues 3–19 and 29–122 are well‐defined, while for the AF2 structure residues 2‐46 and 53–123 are well‐defined. (C) Comparison of AF2 conformer with highest GDT score (blue) with the representative conformer from the original NMR structure ensembles with best DP score, for residues 3–19, 29–46, and 53–122. (D,E) DP versus GDT scores and DP scores versus predictor group for original NMR structure (D) and revised NMR structure (E). The red horizontal lines in (D) and (E) are drawn at the DP scores of the best scoring conformation from the ensemble of experimental structures. (F) Revised NMR structure (PDB ID 7n82) (green), illustrating the not‐well‐defined segments (brown). Residues 3–20 and 26–123 are well‐defined. (G) Comparison of AF2 conformer with highest GDT score (blue) with the representative conformer from the original NMR structure ensembles with best DP score, for residues 3–19, 29–46 and 53–122. The well‐defined backbone (N, Cα, C′) atoms are superimposed and both the backbone superimposition and associated core sidechains are illustrated (H). Only residues that are well‐defined in both the original NMR, revised NMR and AF2 structures were included in superimposition and GDT score calculations
FIGURE 5
FIGURE 5
Inverse structure determination. (left) Flow chart of inverse structure determination of T1029 using AF2 model as input. The AF2 models, resonance assignments, Talos‐N dihedral restraints, and RDC restraints were combined with the manually‐refined NOESY peak lists and used as input for NOESY peak assignment with the program ASDP. The Recall violation list (NOESY peaks not consistent with resulting models) was then used to further guide manual refinement of the NOESY peak list, and the process was reiterated. Blue and red arrows indicate program input and output, respectively. (right) Plots of calculated versus observed RDCs for HN–N, Hα–Cα, and Cα–C′ bond vectors for original and revised NMR structures, and RDCs for Cα–C′ bond vectors for AF2 models
FIGURE 6
FIGURE 6
NMR‐assisted prediction of an integral membrane protein. Ranking of (A) NMR assisted and (B) unassisted (pure prediction) CASP models based on the DP score of the predictor‐defined first model (DP_first) or the best scoring model submitted (DP_best). Scores for the AF2 predictor group are highlighted in red among the unassisted prediction groups. (C) NMR‐assisted model and (D) regular prediction (unassisted) model with the best DP scores. The models are colored with information from TALOS_N: blue, residues for which backbone conformation is consistent with chemical shift data; red, residues for which backbone conformation is not consistent with chemical shift data; orange, residues with no consensus dihedral angles predicted by Talos_N; yellow, residues that chemical shift data indicate to be dynamic. Residues 59, 66, 164, and 169 (red) are labeled as reference points. Residues in segments 52–67 and 162–169, which have backbone conformations that are identified by Talos_N as dynamic (yellow), inconsistent (red), or no consensus (orange), but also located in predicted regular secondary structures are considered to be inconsistent with the backbone chemical shift data, and may involve multiple conformations
FIGURE 7
FIGURE 7
DP and GDT scores for NMR structures in CASP14. Plot of DP score for best‐scoring experimental model versus GDT of best scoring CASP model relative to coordinates of PDB IDs 7d20, 6uf2, 7n82, and 6zyc, for targets T1027, T1029, T1029_revised, and T1055, respectively. The horizontal dashed line is an empirical cutoff for an accurate NMR structure model

References

    1. Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of methods of protein structure prediction (CASP) – round XIV. Proteins. 2021. - PMC - PubMed
    1. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, CA: Curran Associates Inc.; 2017; pp. 6000–6010.
    1. Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W.. CCNet: Criss‐cross attention for semantic segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision; 2019.
    1. Morcos F, Pagnani A, Lunt B, et al. Direct‐coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci U S A. 2011;108:E1293‐E1301. doi:10.1073/pnas.1111471108 - DOI - PMC - PubMed
    1. Marks DS, Colwell LJ, Sheridan R, et al. Protein 3D structure computed from evolutionary sequence variation. PLoS One. 2011;6:e28766. doi:10.1371/journal.pone.0028766 - DOI - PMC - PubMed

Publication types

Substances

Cite

AltStyle によって変換されたページ (->オリジナル) /