Skip to main content
NCBI home page
As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsement of, or agreement with, the contents by NLM or the National Institutes of Health.
Learn more: PMC Disclaimer | PMC Copyright Notice
[画像:Nucleic Acids Research logo]
. 2010 Nov 18;39(Database issue):D1114–D1117. doi: 10.1093/nar/gkq1141

PlantTFDB 2.0: update and improvement of the comprehensive plant transcription factor database

He Zhang 1, Jinpu Jin 1, Liang Tang 1, Yi Zhao 1, Xiaocheng Gu 1, Ge Gao 1,*, Jingchu Luo 1,*
1Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering and College of Life Sciences, Peking University, Beijing, 100871, PR China

*To whom correspondence should be addressed. Tel:/Fax: +86 10 6275 5206; Email: luojc@pku.edu.cn

Correspondence may also be addressed to Ge Gao. Tel:/Fax: +86 10 6275 1861; Email: gaog@mail.cbi.pku.edu.cn

The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.

Received 2010 Sep 13; Revised 2010 Oct 19; Accepted 2010 Oct 22; Issue date 2011 Jan; Collection date 2011 Jan.

© The Author(s) 2010. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

PMCID: PMC3013715 PMID: 21097470

Abstract

We updated the plant transcription factor (TF) database to version 2.0 (PlantTFDB 2.0, http://planttfdb.cbi.pku.edu.cn) which contains 53 319 putative TFs predicted from 49 species. We made detailed annotation including general information, domain feature, gene ontology, expression pattern and ortholog groups, as well as cross references to various databases and literature citations for these TFs classified into 58 newly defined families with computational approach and manual inspection. Multiple sequence alignments and phylogenetic trees for each family can be shown as Weblogo pictures or downloaded as text files. We have redesigned the user interface in the new version. Users can search TFs with much more flexibility through the improved advanced search page, and the search results can be exported into various formats for further analysis. In addition, we now provide web service for advanced users to access PlantTFDB 2.0 more efficiently.

INTRODUCTION

Transcription factors (TFs) are key regulators for transcriptional expression in biological processes (1). During the past years, several databases of plant TFs and other transcription regulators have been publicly available, such as PlnTFDB (2), PlantTAPDB (3), GRASSIUS (4), DATFAP (5), AGRIS (6), RARTF (7), LegumeTFDB (8) and TOBFAC (9). Start from 2005, we have constructed several species-specific plant TF databases with available genome sequences of Arabidopsis (DATF) (10), rice (DRTF) (11) and poplar (DPTF) (12), and integrated them into a comprehensive plant TF database (PlantTFDB 1.0) (13) with 26 402 TFs identified from 22 species. Of these 22 plants, five species have completed genome sequences and the others have unique transcripts integrated by PlantGDB (14). PlantTFDB 1.0 has received millions web hits since it went online in July 2007.

With the rapid increase of plant genome sequences in public databases, we have updated the PlantTFDB 1.0 to version 2.0. PlantTFDB 2.0 contains TFs from 49 species covering the main lineages of the plant kingdom, 9 from green algae, 1 from moss, 1 from fern, 3 from gymnosperm and 35 from angiosperm. Using the refined pipeline, a total of 53 319 TFs were identified from these 49 species and classified into 58 families. We made both computational annotation and manual curation for those putative TFs. In order to infer the evolutionary relationships among identified TFs, we constructed phylogenetic trees for each TF family and predicted ortholog groups for the TFs identified from species with completed genome sequences. The web interface of the PlantTFDB 2.0 was redesigned to provide users with more flexible search functionality. In addition to browsing through a web browser, standard web service interface is now supported for advanced users to retrieve data from PlantTFDB 2.0 in a batch mode or integrate data in PlantTFDB 2.0 into their website. All resources in PlantTFDB 2.0 can be browsed, retrieved and downloaded freely.

RESULTS AND DISCUSSION

Improved identification pipeline for plant TFs

While annotations generated by genome sequencing projects provide the most abundant source for proteome of the given species, the automatic annotation nature may often produce incomplete or incorrect annotation (15). On the other hand, dedicated sequence databases like RefSeq (16) provide relatively high quality curation-based annotation. And expressed sequence tag (EST) is also an important source to complement genome annotation. By integrating all existing annotations derived from genome annotation, RefSeq, PlantGDB (14) and UniGene (17), we compiled a non-redundant reference proteome dataset for all 49 species (Supplementary Table S1, Supplementary Figures S1 and S2) for TF prediction.

TFs are characterized by their signature DNA-binding domains (DBDs). We employed HMMER 3.0 to identify those signature DBDs from the above proteome data set. In total, 64 HMM models were used to identify domains in TF (Supplementary Table S2), of which 53 models were collected from Pfam 24.0 (18) and 11 models were built using the sequences we collected locally. In the previous version, we set e-value 0.01 as the threshold for domain identification. Based on manual inspection and literature review, we adopted domain-specific bit-score as the threshold in the current version, since e-value is dependent on the size of given protein data set (Supplementary Tables S3 and S4).

In PlantTFDB 2.0, we adopted a slightly stringent definition that TFs are ‘proteins that show sequence-specific DNA binding and are capable of activating or/and repressing transcription’ (19). We made an extensive literature review and refined the rule-based classification scheme accordingly (Figure 1 and Supplementary Table S5). In PlantTFDB 2.0, we excluded families that do not meet the above criteria (Supplementary Table S6), including transcription cofactors and chromatin-related proteins such as remodeling factors, histone demethylases, DNA methyltransferases and histone acetyltransferases. Families such as TUBBY-like and Alfin-like were also removed since they were questioned or disproved by new experimental evidences. On the other hand, five newly identified TF families (DBB, FAR1, LSD, NF-X1, STAT) were added in PlantTFDB 2.0. Due to differences in domain composition, DNA binding specificity and function, AP2/ERF and HB were divided to sub-families. The M type of MADS TFs was classified as a new subfamily, since it has been reported that some M type of MADS-box genes could be pseudogenes or a new class of transposable element (19). Finally, we predicted 53 319 TFs from 49 species and classified them into 58 families (Tables 1 and 2, Supplementary Tables S7 and S8) using the refined pipeline.

Figure 1.

Figure 1.

Family assignment rules used to identify and assign TFs into different families. Green ellipses represent TF families, and red rectangles denote DBDs. Blue and purple rectangles denote auxiliary and forbidden domains, respectively. Green solid lines link families and DBDs or auxiliary domains, number ‘1’ or ‘2’ on the lines indicate number of DBDs. Red dash lines link families and forbidden domains.

Table 1.

Summary of TFs identified from species with genome sequences

Lineage Species Common name Protein TF (%) Family OGa TFOGa
Monocotyledon Brachypodium distachyon Purple False Brome 30 726 1687 5.49 56 1016 1271
Oryza sativa subsp. indica Indian Rice 43 027 1936 4.50 56 1427 1692
Oryza sativa subsp. japonica Japanese Rice 58 760 2424 4.13 56 1422 1636
Sorghum bicolor Sorghum 35 810 1819 5.08 54 1252 1583
Zea mays Maize 62 184 3355 5.40 56 1208 1762
Dicotyledon Arabidopsis lyrata Lyrate Rockcress 32 233 1729 5.36 58 1298 1604
Arabidopsis thaliana Thale Cress 32 125 2016 6.28 58 1297 1609
Carica papaya Papaya 27 829 1387 4.98 58 881 1203
Cucumis sativus Cucumber 27 725 1769 6.38 57 894 1153
Glycine max Soybean 48 707 3546 7.28 57 1148 3057
Lotus japonicus 27 974 1275 4.56 56 752 986
Manihot esculenta Cassava 46 478 2201 4.74 58 1084 1922
Medicago truncatula Barrel Medic 52 086 1605 3.08 56 823 1272
Mimulus guttatus Spotted Monkey Flower 27 989 1681 6.01 57 863 1345
Populus trichocarpa Western Balsam Poplar 45 183 2585 5.72 58 1086 2195
Prunus persica Peach 28 299 1513 5.35 58 1006 1380
Ricinus communis Castor Bean 31 953 1291 4.04 57 994 1170
Vitis vinifera Wine Grape 47 097 2436 5.17 58 921 1207
Fern Selaginella moellendorffii 32 969 971 2.95 55 411 856
Moss Physcomitrella patens subsp. patens 40 604 1188 2.93 53 322 863
Green alga Chlamydomonas reinhardtii 23 042 224 0.97 30 123 136
Chlorella sp. NC64A 9762 163 1.67 28 94 120
Coccomyxa sp. C-169 9900 123 1.24 29 82 90
Micromonas pusilla CCMP1545 10 518 141 1.34 32 119 124
Micromonas sp. RCC299 10 074 153 1.52 32 124 134
Ostreococcus lucimarinus CCE9901 7960 118 1.48 30 100 103
Ostreococcus sp. RCC809 7484 100 1.34 29 95 97
Ostreococcus tauri 7654 97 1.27 26 89 91
Volvox carteri 15 416 168 1.09 28 125 137

aOG: number of ortholog groups including at least two TFs; TFOG: number of TFs in ortholog groups.

Table 2.

Summary of TFs identified from species without genome sequences

Groups Species Common name Protein TF (%) Family
Monocotyledon Hordeum vulgare Barley 24 020 778 3.24 54
Panicum virgatum Switchgrass 30 078 1140 3.79 52
Saccharum officinarum Sugarcane 21 172 671 3.17 48
Triticum aestivum Wheat 20 494 746 3.64 53
Dicotyledon Arachis hypogaea Peanut 7243 219 3.02 39
Artemisia annua Sweet Wormwood 13 062 514 3.94 48
Brassica napus Rape 30 482 1334 4.38 53
Brassica rapa Field Mustard 14 313 718 5.02 49
Citrus sinensis Valencia Orange 13 522 534 3.95 46
Gossypium hirsutum Upland Cotton 20 862 1111 5.33 50
Helianthus annuus Sunflower 8634 279 3.23 44
Malus x domestica Apple 15 173 658 4.34 51
Nicotiana tabacum Tobacco 18 898 793 4.20 52
Raphanus sativus Radish 14 799 573 3.87 45
Solanum lycopersicum Tomato 15 722 799 5.08 54
Solanum tuberosum Potato 17 445 776 4.45 52
Theobroma cacao Cocoa 7493 239 3.19 44
Vigna unguiculata Cowpea 12 205 475 3.89 48
Gymnosperm Picea glauca White Spruce 15 376 508 3.30 48
Picea sitchensis Sitka Spruce 10 989 319 2.90 47
Pinus taeda Loblolly Pine 13 275 434 3.27 47

Comprehensive annotation for plant TFs

Comprehensive and accurate annotations derived from various sources provide valuable clues for further functional analysis. Based on our established annotation pipeline, we performed systematic annotation for each family and individual TF.

The main page of each family has a distribution chart to show the number of TFs of each species in this family. The information of brief introduction and key references for each family was updated based on literature survey. Multiple sequence alignments for DBDs of each family, either of individual species or among species, can be viewed as WebLogo pictures, or downloaded as text files. Phylogenetic trees can be displayed online or downloaded to local PC in Nexus format. Intra-species phylogenetic trees for each TF family were inferred by MrBayes (v3.2) (20) using the Dayhoff substitution model with 50 000 generations, and FastTree2.1 (21) was employed to construct inter-species trees with 100 resamplings. Annotations at the individual TF level contain general information, domain architecture, gene ontology, PDB hits, expression profiles, cross-references to other databases, ortholog groups, literature citations and links to other useful resources.

Improvement of user interface

We have redesigned the web interface for PlantTFDB 2.0 which has a uniform interface for all species now. Users can browse individual TFs of different families for each species by simply clicking the unique IDs assigned to each TF. The text search page has been greatly improved with much more flexibility for users to make advanced search. Users can select several species in the same or different lineages within the species tree to search TFs in one or more families. Users can combine several query conditions in a single search, including general descriptions, protein properties such as the range of sequence length, various tissues of gene expression and different fields of annotation for TF entries. Users can also customize and save the search results in various formats for further processing.

While accessing the resource through web browsers is an easy and intuitive way for most users, web service is efficient for advanced users to access and integrate data into their own sites. We implemented a standard web service interface for PlantTFDB 2.0 (http://planttfdb.cbi.pku.edu.cn/webservice/server.php). A demo for client implementation in PHP is available to help users to get familiar with the web service interface (http://planttfdb.cbi.pku.edu.cn/webservice_client/client.php).

FURTHER DIRECTION

In conclusion, PlantTFDB 2.0 is not only an extensive update of the previous version with newly released 29 completed genomes and updated data sets, but also a great improvement of the user interface. The pipelines we developed for the prediction of TFs at genome scale, the scheme we defined to classify TF families in plants may provide the user community with some useful tools. We will continue on this project to make further update and improvement of PlantTFDB in the future.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

China 863 (2007AA02Z165), 973 (2007CB946904) and NSFC (31071160) programs. Funding for open access publication: China NSFC (31071160) program.

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

We thank JGI for genome annotations of 10 unpublished species, MGSC for Medicago truncatula data. We appreciate critical comments from all users.

REFERENCES

  • 1.Riechmann JL, Heard J, Martin G, Reuber L, Jiang C, Keddie J, Adam L, Pineda O, Ratcliffe OJ, Samaha RR, et al. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science. 2000;290:2105–2110. doi: 10.1126/science.290.5499.2105. [DOI] [PubMed] [Google Scholar]
  • 2.Perez-Rodriguez P, Riano-Pachon DM, Correa LG, Rensing SA, Kersten B, Mueller-Roeber B. PlnTFDB: updated content and new features of the plant transcription factor database. Nucleic Acids Res. 2010;38:D822–D827. doi: 10.1093/nar/gkp805. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Richardt S, Lang D, Reski R, Frank W, Rensing SA. PlanTAPDB, a phylogeny-based resource of plant transcription-associated proteins. Plant Physiol. 2007;143:1452–1466. doi: 10.1104/pp.107.095760. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Yilmaz A, Nishiyama MY, Jr, Fuentes BG, Souza GM, Janies D, Gray J, Grotewold E. GRASSIUS: a platform for comparative regulatory genomics across the grasses. Plant Physiol. 2009;149:171–180. doi: 10.1104/pp.108.128579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Fredslund J. DATFAP: a database of primers and homology alignments for transcription factors from 13 plant species. BMC Genomics. 2008;9:140. doi: 10.1186/1471-2164年9月14日0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Palaniswamy SK, James S, Sun H, Lamb RS, Davuluri RV, Grotewold E. AGRIS and AtRegNet. a platform to link cis-regulatory elements and transcription factors into regulatory networks. Plant Physiol. 2006;140:818–829. doi: 10.1104/pp.105.072280. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Iida K, Seki M, Sakurai T, Satou M, Akiyama K, Toyoda T, Konagaya A, Shinozaki K. RARTF: database and tools for complete sets of Arabidopsis transcription factors. DNA Res. 2005;12:247–256. doi: 10.1093/dnares/dsi011. [DOI] [PubMed] [Google Scholar]
  • 8.Mochida K, Yoshida T, Sakurai T, Yamaguchi-Shinozaki K, Shinozaki K, Tran LS. LegumeTFDB: an integrative database of Glycine max, Lotus japonicus and Medicago truncatula transcription factors. Bioinformatics. 2010;26:290–291. doi: 10.1093/bioinformatics/btp645. [DOI] [PubMed] [Google Scholar]
  • 9.Rushton PJ, Bokowiec MT, Laudeman TW, Brannock JF, Chen X, Timko MP. TOBFAC: the database of tobacco transcription factors. BMC Bioinformatics. 2008;9:53. doi: 10.1186/1471-2105-9-53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Guo A, He K, Liu D, Bai S, Gu X, Wei L, Luo J. DATF: a database of Arabidopsis transcription factors. Bioinformatics. 2005;21:2568–2569. doi: 10.1093/bioinformatics/bti334. [DOI] [PubMed] [Google Scholar]
  • 11.Gao G, Zhong Y, Guo A, Zhu Q, Tang W, Zheng W, Gu X, Wei L, Luo J. DRTF: a database of rice transcription factors. Bioinformatics. 2006;22:1286–1287. doi: 10.1093/bioinformatics/btl107. [DOI] [PubMed] [Google Scholar]
  • 12.Zhu QH, Guo AY, Gao G, Zhong YF, Xu M, Huang M, Luo J. DPTF: a database of poplar transcription factors. Bioinformatics. 2007;23:1307–1308. doi: 10.1093/bioinformatics/btm113. [DOI] [PubMed] [Google Scholar]
  • 13.Guo AY, Chen X, Gao G, Zhang H, Zhu QH, Liu XC, Zhong YF, Gu X, He K, Luo J. PlantTFDB: a comprehensive plant transcription factor database. Nucleic Acids Res. 2008;36:D966–D969. doi: 10.1093/nar/gkm841. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Duvick J, Fu A, Muppirala U, Sabharwal M, Wilkerson MD, Lawrence CJ, Lushbough C, Brendel V. PlantGDB: a resource for comparative plant genomics. Nucleic Acids Res. 2008;36:D959–D965. doi: 10.1093/nar/gkm1041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Ouyang S, Thibaud-Nissen F, Childs KL, Zhu W, Buell CR. Plant genome annotation methods. Methods Mol. Biol. 2009;513:263–282. doi: 10.1007/978-1-59745-427-8_14. [DOI] [PubMed] [Google Scholar]
  • 16.Pruitt KD, Tatusova T, Klimke W, Maglott DR. NCBI Reference Sequences: current status, policy and new initiatives. Nucleic Acids Res. 2009;37:D32–D36. doi: 10.1093/nar/gkn721. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Federhen S, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2010;38:D5–D16. doi: 10.1093/nar/gkp967. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, et al. The Pfam protein families database. Nucleic Acids Res. 2010;38:D211–D222. doi: 10.1093/nar/gkp985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Riechmann J. Transcription factors of Arabidopsis and rice: a genomic perspective. In: Grasser K, editor. Regulation of Transcription in Plants. Oxford: Wiley-Blackwell; 2006. pp. 28–53. [Google Scholar]
  • 20.Ronquist F, Huelsenbeck JP. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003;19:1572–1574. doi: 10.1093/bioinformatics/btg180. [DOI] [PubMed] [Google Scholar]
  • 21.Price MN, Dehal PS, Arkin AP. FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5:e9490. doi: 10.1371/journal.pone.0009490. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

ACTIONS

RESOURCES

AltStyle によって変換されたページ (->オリジナル) /