This site needs JavaScript to work properly. Please enable it to take advantage of the complete set of features!
Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

NIH NLM Logo
Log in
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;9(3):e1003000.
doi: 10.1371/journal.pcbi.1003000. Epub 2013 Mar 28.

Transcriptome profiling of Giardia intestinalis using strand-specific RNA-seq

Affiliations

Transcriptome profiling of Giardia intestinalis using strand-specific RNA-seq

Oscar Franzén et al. PLoS Comput Biol. 2013.

Abstract

Giardia intestinalis is a common cause of diarrheal disease and it consists of eight genetically distinct genotypes or assemblages (A-H). Only assemblages A and B infect humans and are suggested to represent two different Giardia species. Correlations exist between assemblage type and host-specificity and to some extent symptoms. Phenotypical differences have been documented between assemblages and genome sequences are available for A, B and E. We have characterized and compared the polyadenylated transcriptomes of assemblages A, B and E. Four genetically different isolates were studied (WB (AI), AS175 (AII), P15 (E) and GS (B)) using paired-end, strand-specific RNA-seq. Most of the genome was transcribed in trophozoites grown in vitro, but at vastly different levels. RNA-seq confirmed many of the present annotations and refined the current genome annotation. Gene expression divergence was found to recapitulate the known phylogeny, and uncovered lineage-specific differences in expression. Polyadenylation sites were mapped for over 70% of the genes and revealed many examples of conserved and unexpectedly long 3' UTRs. 28 open reading frames were found in a non-transcribed gene cluster on chromosome 5 of the WB isolate. Analysis of allele-specific expression revealed a correlation between allele-dosage and allele expression in the GS isolate. Previously reported cis-splicing events were confirmed and global mapping of cis-splicing identified only one novel intron. These observations can possibly explain differences in host-preference and symptoms, and it will be the basis for further studies of Giardia pathogenesis and biology.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. RNA-seq technical details.
(A) Insert size histogram of sequenced cDNA fragments inferred from mapped paired-end reads. The plotted data are from the WB isolate. The x- and y-axes show the fragment size in nucleotides and the frequency, respectively. The median length was 250 nt. (B) The relationship between detected transcripts and mapped paired-end reads. The x- and y-axes show the number of mapped reads and the number of detected transcripts, respectively. Colors correspond to: violet (WB), blue (P15), green (GS), yellow (AS175P4), and dotted (AS175P33). The plateau indicates saturation (deeper sequencing do not lead to detection of new transcripts). Since the reference genomes slightly differ in finishing, the plateau y-values are different. (C) Gene expression correlation of technical replicates (WB isolate). Technical replicates 1 and 2 are from the same sequencing library (biological sample) but sequenced independently on different lanes. Dots represent genes. The x- and y-axes show log10-scaled FPKM of technical replicates 1 and 2 respectively (values were incremented by 1 before transformation). The blue line corresponds to equal expression. Colors represent overlap in the plot; i.e., black means a single gene and red means higher plotting density. (D) Correlation of gene expression between in vitro passages 4 and 33 of the AS175 isolate, i.e., correlation of biological replicates.
Figure 2
Figure 2. A non-transcribed region on chromosome 5.
(A) Correlation of RNA-seq and RT-qPCR gene expression measurements for 49 genes in WB. Black dots are genes. The x- and y-axes show log10 FPKM and −log10 Ct (cycle threshold). Values were incremented by 1 before log10-transformation. Each RT-qPCR reaction was performed in triplicates, and the average Ct was used. The blue line is the linear regression (y = 0.07008x-1.50276). Included genes (prefix GL50803_): 7766, 9662, 6744, 7760, 112103, 11654, 11118, 2661, 24321, 17121, 17585, 14993, 16924, 13272, 6564, 5800, 3367, 17570, 16343, 93548, 11540, 5435, 15000, 21423, 10297, 114210, 86681, 7573, 7715, 102438, 7243, 16438, 17291, 1903, 17495, 102978, 11642, 17539, 90575, 32674, 13091, 137688, 3666, 25075, 16690, 2633, 92664, 13627, 4431. (B) Sliding window analysis of RNA-seq coverage on scaffold CH991767 (part of chromosome 5; WB). Analyzed windows were 500 bp wide and not overlapping. (X-axis) Position along the genomic segment (start position of the analyzed window). (Y-axis) RNA-seq depth in the window on a logarithmic scale. Drop in the RNA-seq coverage is seen in positions 1,340,000 to 1,381,000.
Figure 3
Figure 3. Transcription levels at various classes of protein-coding features.
(A) Fraction of the genome occupied by various protein-coding features (WB isolate): uncharacterized genes (hypothetical genes), Protein 21.1, cysteine-rich membrane protein genes (vsp and HCMP genes), Kinase NEK and other genes. The x-axis shows the total protein-coding capacity of the genome for each group of genes. (B) Fraction of the RNA-seq data that mapped on categories in (A). (C) Smooth scatter plot of the relationship between gene expression and ORF length. The x- and y-axes show log10-scaled FPKM and ORF length (bp). ORFs longer than 8000 bp were not plotted (n = 71). Transition towards more intense blue means higher plotting density. (D) Box plots of gene expression of different categories of genes. Black dots represent outliers. One-way ANOVA concluded a significant difference between the groups (p<2.2e-16). The Protein 21.1/HCMP pairwise comparison was significant at p = 0.0008972 and vsp/HCMP was significant at p = 0.04642 (Tukey's HSD test). The groups ‘uncharacterized’ and ‘others’ were ignored in the pairwise statistics. (E) Box plots of genes grouped according to Gene Ontology (GO). Genes were categorized into broader groups by Biological Process using the generic slimmed Gene Ontology. Black dots represent outliers. Numbers to the right indicate how many genes were in the category. Groups are sorted after median expression. The following Gene Ontology categories are shown (GO): 0006412, 0009056, 0006457, 0055085, 0006520, 0005975, 0044281, 0006810, 0006399, 0007165, 0034641, 0008150, 0009058, 0016192, 0006464, 0006629, 0006950, 0006259.
Figure 4
Figure 4. Pairwise comparisons of global transcription levels.
(A) Phylogenetic relationship of the four studied isolates. The phylogeny was inferred from 10 concatenated protein-sequences of each genome. The data set was aligned with ClustalW v2.1, and a neighbor-joining tree was constructed using MEGA v.5 . The tree is based on the conceptual translations of the following protein-coding genes (only WB locus-tags are shown; prefix GL50803_): 16936, 31340, 15039, 16445, 14972, 16747, 16681, 9072, 17112, and 11384. The scale bar shows the number of substitutions per site. The name of the isolate is shown in blue at the tip of the branches, and the assemblage (genotype) is shown in light grey. Numbers to the right indicate the number of differentially expressed genes compared to the other isolates. (B) Pairwise correlations of global transcription levels. Each dot represents a conserved four-way ortholog. The x- and y-axes show gene expression of isolate 1 and 2 (log10-scaled FPKM; values incremented by 1 before transforming). The colors represent overlap in the plot, i.e. black means a single gene and red means higher plotting density.
Figure 5
Figure 5. Polyadenylation sites.
(A) Histogram of clustered polyadenylation sites (PACs) and their normalized position on ORFs. On the x-axis, 0 and 1 refer to the first and last base of the ORF. Only polyadenylation sites of the sense direction with respect to the ORF are shown. The y-axis shows the frequency. The data are from the WB isolate. (B) Bar-plot of the number of sense PACs per ORF. (C) Example of alternative polyadenylation of the gene encoding 3-hydroxy-3-methylglutaryl-coenzyme A reductase (GL50803_7573). Arrows indicate locations of the polyA site. Blue arrows indicate polyA sites of the same direction as the genes, and the orange arrow indicates a polyA site of the reversed strand. Numbers of supporting reads (polyA tags) are shown on top of the arrows. The polyA sites are separated by 275 bp. (D) Histogram of 3′ UTR length (nt) inferred from mapped polyA sites. Only the WB isolate is shown. (E) Relationship between gene expression signal (GES; log10 FPKM) and 3′ UTR length. The y-axis shows the log10 GES and x-axis shows the 3′ UTR length (nt). Only 3′ UTRs <500 nt are plotted. (F) Nucleotide length differences between orthologous 3′ untranslated regions.
Figure 6
Figure 6. Polyadenylation signals.
(A) Prominent hexamer motifs identified in 3′ transcript fragments (data from the WB isolate). The hexamers were identified using the procedure described by Beaudoing et al. Positions −40 to −1 relative to the polyadenylation site were searched. One 3′ fragment can contain more than one PAS hexamer and therefore be counted twice. Asterisks (*) indicate hexamers that have previously been reported as putative G. intestinalis polyadenylation signals. (B) Nucleotide composition surrounding polyadenylation sites that occur sense, antisense, and intergenic with respect to protein-coding genes. Red, green, blue, and violet correspond to nucleotides A, U, C, and G. The x-axis shows the nucleotide position in relation to the polyA site (black arrow). The y-axis shows the percentage of each base. (C) Frequency of unique hexamers in transcript 3′ fragments (nucleotides −40 to −1 relative to the polyA site). The x-axis shows number of unique motifs found, and the y-axis shows the number of 3′ fragments (split on antisense, sense, and intergenic sites). (D) Histograms of the position of the four most frequent hexamers in relation to the polyA site (black arrow). The position on the x-axis refers to the last nucleotide of the hexamer.
Figure 7
Figure 7. Analyses of allele-specific expression.
(A) An example of a heterozygous locus identified from genomic Roche 454 reads (horizontal bars). Colors represent alignments in the forward and reverse directions. The arrow indicates a heterozygous locus. (B) Density plots of allelic expression ratios calculated from simulated reads and RNA-seq reads. The black and red lines correspond to simulated reads containing 0.01 and 0.02 errors/base. The blue line represents RNA-seq data (GS isolate). (C) Histogram of allele ratios (genomic) of heterozygous loci. The superimposed curve (brown) shows the density of the underlying data. The allele ratio was calculated from genomic reads as the fraction of allele A among allele A+B. Grey bars represent loci containing two presumed copies of each allele and blue bars three copies (or vice versa) of each allele. (D) Boxplots of allele expression ratios (y-axis) according to the allele ratio (x-axis). The black line of each box is the median. Dots represent outliers. (E) Allelic Expression Ratios of heterozygous loci of the same haplotype phase. Each dot represents one pair of linked heterozygous loci (SNP pairs). The allelic expression of SNPs 1 and 2 are shown on the x- and y-axes. Red dots indicate discordant heterozygous loci and black dots represent concordant heterozygous loci; i.e., the direction of gene expression change is the same.

References

    1. Ankarklev J, Jerlstrom-Hultqvist J, Ringqvist E, Troell K, Svard SG (2010) Behind the smile: cell biology and disease mechanisms of Giardia species. Nat Rev Microbiol 8: 413–422. - PubMed
    1. Monis PT, Caccio SM, Thompson RC (2009) Variation in Giardia: towards a taxonomic revision of the genus. Trends Parasitol 25: 93–100. - PubMed
    1. Thompson RC (2004) The zoonotic significance and molecular epidemiology of Giardia and giardiasis. Vet Parasitol 126: 15–35. - PubMed
    1. Morrison HG, McArthur AG, Gillin FD, Aley SB, Adam RD, et al. (2007) Genomic minimalism in the early diverging intestinal parasite Giardia lamblia. Science 317: 1921–1926. - PubMed
    1. Franzen O, Jerlstrom-Hultqvist J, Castro E, Sherwood E, Ankarklev J, et al. (2009) Draft genome sequencing of giardia intestinalis assemblage B isolate GS: is human giardiasis caused by two different species? PLoS Pathog 5: e1000560. - PMC - PubMed

Publication types

Associated data

Cite

AltStyle によって変換されたページ (->オリジナル) /