Life Science Database Archive
Gclust Server

Cluster based on sequence comparison of homologous proteins of 95 organism species

Data detail

info Data name Cluster based on sequence comparison of homologous proteins of 95 organism species
info DOI
info Description of data contents

Clustering was performed by the method in which the round-robin BLAST search of the above amino acid sequence data is performed, the E-value and the overlap score (the All-against-all BLASTP search of the above amino acid sequence data, and heuristic estimation of a similarity threshold for homologs of each protein by entropy-optimized organism count method (Bioinformatics 2009 Mar 1;25(5):599-605.). The data are given in a CSV format text file.

info Data file
File name:
gclust_cluster.zip
File size:
8.72MB
info Simple search URL
info Data acquisition method

Sequence data stated in "Amino acid sequences of predicted proteins and their annotation for 95 organism species".

info Data analysis method

All-against-all BLASTP search of the above amino acid sequence data, and heuristic estimation of a similarity threshold for homologs of each protein by entropy-optimized organism count method (Bioinformatics 2009 Mar 1;25(5):599-605.).

info Number of data entries

206,764 entries

Data itemPrimary keyForeign keyDescription
Cluster ID checkbox ID of cluster
Representative sequence ID checkbox ID of a sequence that represents the cluster. gclust_seq is referenced.
Link to cluster sequences Link to the list of sequences belonging to the cluster (empty space)
Link to related sequences Link to the list of sequences that are similar to the cluster, but not clustered
Sequence length Amino acid sequence length
Representative annotation Representative annotation of the cluster
Number of Sequences Number of sequences contained in the cluster
Homologs Number of sequences contained in the cluster
Clustering threshold The threshold of E-value used for clustering
Plants (7species) (%) The appearance rate of this cluster in the plant and algal group (including 7 species)
Other bikonts (9 species) (%) The appearance rate of this cluster in other Bikonta (Chromalveolata, Excavata) group (including 9 species)
Cyano (25species) (%) The appearance rate of this cluster in the cyanobacteria group (including 25 species)
Photo Bact (15species) (%) The appearance rate of this cluster in the photosynthetic bacteria group (including 15 species)
Other Bact (31 species) (%) The appearance rate of this cluster in the non-photosynthetic bacteria group (including 31 species)
Opisthokonts (8species) (%) The appearance rate of this cluster in the opisthokont group (including 8 species)
Number of Sequences for each species The number of sequences by organism species contained in the cluster
Species not appearing in this cluster Organism species not contained in the cluster

AltStyle によって変換されたページ (->オリジナル) /