This site needs JavaScript to work properly. Please enable it to take advantage of the complete set of features!
Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

NIH NLM Logo
Log in
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr 4;32(4):492-504.e4.
doi: 10.1016/j.str.2024年01月01日1. Epub 2024 Feb 16.

EncoMPASS: An encyclopedia of membrane proteins analyzed by structure and symmetry

Affiliations

EncoMPASS: An encyclopedia of membrane proteins analyzed by structure and symmetry

Antoniya A Aleksandrova et al. Structure. .

Abstract

Protein structure determination and prediction, active site detection, and protein sequence alignment techniques all exploit information about protein structure and structural relationships. For membrane proteins, however, there is limited agreement among available online tools for highlighting and mapping such structural similarities. Moreover, no available resource provides a systematic overview of quaternary and internal symmetries, and their orientation relative to the membrane, despite the fact that these properties can provide key insights into membrane protein function and evolution. Here, we describe the Encyclopedia of Membrane Proteins Analyzed by Structure and Symmetry (EncoMPASS), a database for relating integral membrane proteins of known structure from the points of view of sequence, structure, and symmetry. EncoMPASS is accessible through a web interface, and its contents can be easily downloaded. This allows the user not only to focus on specific proteins, but also to study general properties of the structure and evolution of membrane proteins.

Keywords: asymmetry; biological assembly; integral membrane proteins; online database; sequence alignment; structural similarity; structure alignment; symmetry detection.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

Figure 1:
Figure 1:
The methodology used to build the EncoMPASS database. The database contains analysis of every individual membrane-spanning chain as well as all biologically relevant complexes. In the final database, all structures are aligned relative to the membrane, according to the OPM database, or by running the PPM method. The major steps in constructing the database are: (1) extracting the coordinates and other relevant information from available databases, OPM and PDB; (2) constructing membrane-aligned assemblies and identifying the transmembrane regions therein; (3) identifying the symmetries both within individual chains and between chains in every complex; and (4) computing structural and sequence alignments of related entries to provide a network of relationships. The resultant database contains the final coordinate sets, symmetry analysis results, and structural relationship data. Full details can be found in the Methods and STAR Methods.
Figure 2:
Figure 2:
Examples of biological assembly issues. (a, b) The contents of the unit cell are incorrectly assigned as the biological unit for (a) outer membrane protein OMPX from E. coli (PDB code 1QJ9) and (b) bovine rhodopsin (PDB code 1F88). For rhodopsin, the second subunit (orange) is inserted in the opposite orientation in the membrane, which is not physiologically meaningful.
Figure 3:
Figure 3:
Relationships between the number of protein subunits that span the membrane and the overall size of multi-subunit complexes in the EncoMPASS dataset. (a) The fraction of chains, or subunits, in a membrane protein structure (both monomers and multi-subunit complexes) that are membrane-embedded. Coloring indicates the relative contribution of membrane-spanning subunits or chains in the complex to which each chain belongs. Thus, the first column shows that among all the entries containing just one membrane-spanning subunit or chain, the vast majority are monomers (gray), and there are smaller percentages of dimers (yellow), trimers (orange), tetramers (dark orange) or higher oligomers (red) with a single membrane-spanning chain. (b) The same data shown as a heat map against the total number of chains in the complex. Darker blues indicate points with higher populations, quantified using the histograms above (for membrane-spanning chains), or to the right (for total number of chains).
Figure 4:
Figure 4:
Relationships between the number of TM regions, and the number of membrane-spanning subunits per complex for the α-helical membrane protein structures in the EncoMPASS dataset. (a) Number of transmembrane helices per complex plotted as a histogram. The inset illustrates the number. (b) Division of all membrane-embedded subunits or chains according to the number of transmembrane regions in each chain. Coloring indicates the total number of subunits or chains in the complex to which each chain belongs. For example, the first column shows that among protein chains in the database with 1 TM region, the majority belong to complexes with>4 TM chains. (c) Number of transmembrane helices per subunit. (d) Number of structural comparisons carried out between α-helical membrane protein structures after following the criteria to select structurally related structures described in STAR methods. The total number of pair-wise alignments is plotted as a function of the number of membrane-spanning regions in the protein chain of interest. Thus, fewer comparisons were carried out for structures with higher numbers of transmembrane helices, reflecting their distribution in the database. To indicate whether each structure is compared to other structures with a similar number of TM helices, the color coding indicates the relative number of TM regions of the second chain in the pair. Structures with 20 to 30 transmembrane helices can undergo thousands of comparisons, while structures with 30 to 55 transmembrane helices undergo <100 comparisons.
Figure 5:
Figure 5:
Plots provided in the EncoMPASS database to illustrate the network of sequence and structure relationships of each membrane-spanning chain, in this case, one subunit of the tetrameric spinach aquaporin SoPIP2;1 (PDB code 3CN5 chain A). (a) Structure and sequence similarity for all proteins that have been compared to the structure of interest. Each compared protein chain is represented as a point indicating its similarity according to both the MUSCLE sequence alignment (as fraction identical residues) and the FrTM-Align structure-based alignment, where the latter is defined using the so-called template modeling score (TM-score). The histograms along the side and top report the distributions of the points in the graph, summed for each axis. Data points are colored based on the difference in the number of TM segments between the structure of reference and the structure being compared, following the legend, with bluer shades indicating that the structural neighbor has more TM elements, and redder shades indicating that the neighbor has fewer TM elements. The background contours indicate the structural and sequence similarity of all compared pairs of chains in the EncoMPASS dataset. Contours indicate the probability density of the pairwise alignments having a given value of sequence identity and structural similarity. (b) Representation of the structural relationships between the structure of interest and its neighbors, as well as the similarity between those neighbors, using a polar plot. All pairwise structure alignments calculated for the structure of interest with TM-score>0.6 are represented as points. The structure of interest is placed in the center of the plot, and concentric circles indicate TM-scores of 1.0 to 0.6 (red). Distances between any two points in the graph are proportional to the TM-score distance between the two structures. Colors of individual points follow the legend for panel (a). (c) Regions of structural similarity and variability as a function of the protein sequence between the structural neighbors and the membrane-spanning chain of interest. Cα-Cα distances are shown averaged over values for multiple related chains compared with the chain of interest. Grey regions indicate the TM elements in the structure of interest. The thickness of each line represents the size of that cluster, and the color of each line corresponds to the data shown in the inset polar plot, which in turn can be traced back to the plot in panel (b). Structural distances are computed after a pair-wise fitting of each chain.
Figure 6.
Figure 6.
Multiple levels of symmetry detected in the slow anion channel (SLAC1) homolog TehA from Haemophilus influenzae (PDB code 3M78, chain A), which contains a C5 internal pseudo-symmetry. (a) CE-Symm recognizes two hierarchical levels of symmetry (indicated by the gray background). Level 2 corresponds to the smallest building block present in both levels of symmetry, and thereby also defines the maximum length of the repeats in level 1. For TehA, the building block is a single transmembrane helix. However, the internal symmetry between two helices is unlikely to be functionally insightful due to their short length. (b) Applying the CE-Symm-R procedure retains the symmetries described by level 1, while eliminating those in level 2. The CE-Symm-R approach also results in a more extensive definition of the level 1 symmetric regions. (c) Using the symmetry results obtained for a structural neighbor to infer the boundaries of the (PDB code 3M71, chain A) increases the coverage of the repeat definition further.
Figure 7.
Figure 7.
Symmetry detection results for CE-Symm (blue) and the multi-step symmetry detection method, MSSD used by EncoMPASS (orange) on a benchmark set of 87 α-helical proteins with distinct folds in MemSTATS v1.2. The methods are evaluated on the described symmetries within all TM chains in the set (internal symmetry) and within all complexes with>1 membrane-spanning chain (quaternary symmetry): in total, 118 and 74 symmetry descriptions, respectively. The percentage of symmetries identified correctly, which is the sum of the true-positive and true-negative rates (Table 1), as well as the false-positive rate are shown for each method (Order and Coverage). In cases where a method detects the symmetry described in the benchmark, but reports a symmetry repeat that is only part of the benchmark repeat, i.e., it is missing at least 20 consecutive residues, the correct rate is shown in lighter color (Order only).
Figure 8.
Figure 8.
The bacterial glutamate transporter homolog GltPh (PDB code 2NWX chain A) contains two small, non-hierarchical symmetries within each protomer, which cannot be captured by CE-Symm using default parameters. The first half of the protomer (a) is C2-pseudo-symmetric, as is the second half of the protomer (b), but the fold of the first two repeats is too distinct from the fold of the other two repeats (c) for CE-Symm to identify any connection. Because the CE-Symm algorithm, by construction, cannot report the two symmetry relationships simultaneously, and because each of the individual symmetries alone scores below the default TM-score threshold, CE-Symm detected no symmetry in this structure.
Figure 9.
Figure 9.
Strategies for ordering the chains of a complex to aid detection of their symmetric arrangements. (a) The PDB coordinates of the KcsA channel (PDB code 1R3J) present the chains in an order that is not conducive to detecting their symmetric arrangement because there is no single transformation that can match all subunits. To address this issue, the chains can be re-arranged starting from a chain at one end of the complex and progressively adding the closest unordered neighbor. Repeating this until all chains have been accounted for allows for the C4 quaternary symmetry to be detected. (b) Particulate methane monooxygenase (PDB code 3RFR) contains a C3 symmetry relating chains A-C, E-G and I-K, which the previous strategy fails to detect because chains D and H have no symmetry equivalent. Instead, a SymD self-alignment can be used for re-arranging the order of the chains so that the symmetry can be detected.

References

    1. Krogh A, Larsson B, von Heijne G, and Sonnhammer ELL (2001). Predicting transmembrane protein topology with a hidden markov model: application to complete genomes11Edited by F. Cohen. J Mol Biol 305, 567–580. 10.1006/jmbi.2000.4315. - DOI - PubMed
    1. Nugent T, and Jones DT (2009). Transmembrane protein topology prediction using support vector machines. BMC Bioinformatics 10, 159. 10.1186/1471-2105年10月15日9. - DOI - PMC - PubMed
    1. Santos R, Ursu O, Gaulton A, Bento AP, Donadi RS, Bologa CG, Karlsson A, Al-Lazikani B, Hersey A, Oprea TI, et al. (2017). A comprehensive map of molecular drug targets. Nat Rev Drug Discov 16, 19–34. 10.1038/nrd.2016.230. - DOI - PMC - PubMed
    1. Uhlén M, Fagerberg L, Hallström BM, Lindskog C, Oksvold P, Mardinoglu A, Sivertsson Å, Kampf C, Sjöstedt E, Asplund A, et al. (2015). Tissue-based map of the human proteome. Science (1979) 347, 1260419. 10.1126/science.1260419. - DOI - PubMed
    1. https://www.proteinatlas.org/humanproteome/tissue/druggable.

Publication types

Substances

LinkOut - more resources

Cite

AltStyle によって変換されたページ (->オリジナル) /