- Software
- Open access
- Published:
H3NGST: a fully automated, web-based platform for end-to-end ChIP-seq analysis
BMC Bioinformatics volume 26, Article number: 243 (2025) Cite this article
-
113 Accesses
Abstract
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a widely used technique for mapping protein-DNA interactions and histone modifications across the genome. Despite its utility, current analysis tools often require manual file processing, rigid input formats, and a significant level of bioinformatics expertise, posing a challenge for many experimental researchers.
AbstractSection ResultsWe present H3NGST (Hybrid, High-throughput, and High-resolution NGS Toolkit), a fully automated and user-friendly web-based platform for ChIP-seq analysis. H3NGST streamlines the entire analysis workflow, including raw data retrieval via BioProject ID, quality control, adapter trimming, reference genome alignment, peak calling, and genomic annotation. The system also categorizes peaks by genomic region, such as promoters. With minimal user input, H3NGST provides high resolution and reproducible results for both transcription factor binding and histone modification studies.
AbstractSection ConclusionsH3NGST significantly reduces the technical barriers to ChIP-seq analysis by eliminating the need for local installations, programming skills, or large file uploads. Its intuitive web interface and mobile accessibility extend its usability to researchers with different computational backgrounds. The platform is scalable for high-throughput studies and is freely available at https://ngschiphhh.duckdns.org.
Background
Next-generation sequencing (NGS) technologies have transformed genomic and epigenomic research by enabling high-throughput profiling of gene regulation, chromatin states, and epigenetic modifications [1,2,3,4]. In particular, the development of epigenetic therapeutics, such as histone deacetylase (HDAC) inhibitors (e.g., TSA) and EZH2 inhibitors (e.g., GSK343), has led to an increased demand for efficient, scalable, and accurate genome-wide profiling tools [5, 6].
Among NGS-based approaches, chromatin immunoprecipitation followed by sequencing (ChIP-seq) has become a widely used method for mapping protein-DNA interactions and histone modifications across the genome [7,8,9]. A typical ChIP-seq workflow involves several steps: (i) acquisition of raw sequencing data, (ii) alignment to a reference genome, (iii) peak calling, and (iv) downstream interpretation and annotation [10]. Public repositories such as the Sequence Read Archive (SRA) serve as common sources of raw data, and tools are available for automated data retrieval [11]. Read alignment is a critical step in the workflow, as improper reference selection can lead to inaccurate peak detection. Aligners such as BWA-MEM are commonly used due to their speed, support for paired-end reads, and flexibility for variable read lengths [8, 12, 13]. Peak detection is typically performed using established algorithms such as HOMER [14], MACS2 [15], and SICER [16], with HOMER offering histogram-based peak modeling to reduce false positives. Final analysis steps often include genomic annotation, motif discovery, and functional enrichment to interpret regulatory regions and associated gene networks [17, 18].
While many computational tools and platforms exist for ChIP-seq analysis, few offer a fully automated, upload-free, and end-to-end solution that is simultaneously secure, scalable, and user-friendly. Existing platforms such as Galaxy [19], GenePattern [20], Cistrome Galaxy [21], CSA [22], and commercial services such as Basepair offer varying degrees of web-based functionality, but often require manual data uploads, user registration, or local software installation. In addition, these platforms often lack integrated support for automated data retrieval and full downstream annotation, which poses accessibility challenges for non-specialist users.
To overcome these limitations, we have developed H3NGST (Hybrid, High-throughput, and High-resolution NGS Toolkit), a web-based platform that enables fully automated ChIP-seq analysis from start to finish. By simply entering a BioProject ID, users can initiate a pipeline that performs data retrieval from the SRA, quality control, adapter trimming, genome alignment, peak calling, and genomic annotation. All steps are performed on the server side. No file upload or user authentication is required. All data transmissions are encrypted using SSL/TLS protocols to ensure secure processing and data integrity. H3NGST significantly reduces the technical burden associated with ChIP-seq data analysis, enabling broad access to high-resolution, reproducible results regardless of bioinformatics expertise.
Implementation
Overview of the H3NGST pipeline
H3NGST is a fully automated, web-based platform designed to streamline the analysis of ChIP-seq data from raw data acquisition to peak annotation and visualization. The pipeline consists of four main steps: (1) raw data acquisition, (2) preprocessing and quality control, (3) sequence alignment and file conversion, and (4) peak retrieval with functional annotation (Fig. 1). All processes are performed server-side, without the need for local installation or file uploads.
Overview of the H3NGST workflow. Users initiate ChIP-seq analysis by providing a BioProject accession and selecting analysis parameters via a web interface. The server-side pipeline automatically performs all computational steps using FastQC, Trimmomatic, BWA-MEM, Samtools, Bedtools, DeepTools, and HOMER, generating comprehensive results without user intervention
Retrieval of raw data
To initiate analysis, users enter a valid accession number such as a BioProject (PRJNA), SRA experiment (SRX), GEO sample (GSM), or GEO series (GSE) on the H3NGST interface. The system queries the NCBI Entrez system to resolve these accessions into corresponding SRR identifiers, which are then downloaded using the prefetch utility [11]. The retrieved sra files are converted to fastq format using fasterq-dump. During this step, the system automatically determines whether each dataset is single-end or paired-end based on the metadata in the SRA RunInfo table. This classification is critical, as paired-end data provide enhanced alignment accuracy and resolution for peak detection [23, 24]. All downstream parameters, including those for trimming, alignment, and peak calling, are automatically adjusted based on the library type.
Quality control and pre-processing
Following data retrieval, the raw FASTQ files are subjected to quality assessment using FastQC [25] to detect adapter contamination and low-quality reads. Subsequently, Trimmomatic [26] is then used to remove adapter sequences and trim low-quality bases using a sliding window approach. After trimming, FastQC [25] is run again to evaluate the quality of the processed reads. The resulting high-quality reads are retained for alignment.
Sequence alignment and file conversion
Cleaned reads are aligned to a user-specified reference genome (e.g., hg38, mm10) using BWA-MEM [12], generating SAM files. These are sorted and converted to BAM format using Samtools [27]. The pipeline then uses Bedtools [28] to convert BAM files into BED format for downstream analyses. For genome browser visualization, DeepTools [29] is used to generate BigWig signal tracks from BAM files, providing normalized coverage profiles.
Peak calling, annotation, and motif analysis
Peak calling is performed using HOMER, which supports both narrow (e.g., transcription factor binding) and broad (e.g., histone modification) peak profiles [14]. HOMER also performs motif enrichment analysis to aid in the identification of transcriptional regulators. The resulting peaks are annotated with genomic features such as gene names, proximity to transcription start sites (TSS), and functional categories [14, 23, 24]. Output includes peak coordinates, motif occurrences, and summary statistics. A complete list of all command-line tools, their representative commands, and user-defined parameters used in the H3NGST pipeline is summarized in Table 1.
Accessing and downloading results
Upon completion, users can retrieve analysis results from the H3NGST homepage using their assigned nickname. All output files, including SAM, BAM, BED, BigWig, annotated peak tables, motif discovery results, and quality control reports, are available for direct download in standardized formats for further interpretation or publication.
Results and discussion
Functional overview of the H3NGST web platform
H3NGST is a browser-based platform that supports fully automated ChIP-seq analysis, requiring no local installation or command-line interaction. The system is designed for ease of use, allowing users, regardless of bioinformatics expertise, to perform complete analyses by simply entering a public BioProject accession number (e.g., PRJNA, SRX, GSM, GEO), assigning a nickname, and configuring minimal parameters via a guided four-step interface (Fig. 2A–D). The platform stores previously used nicknames locally in the browser and allows users to quickly reuse them via a selectable nickname history panel. Upon submission, the backend pipeline retrieves sample metadata from the NCBI SRA, automatically detects the library layout (single-end or paired-end), and performs a comprehensive analysis using a suite of open-source tools: prefetch, fasterq-dump, FastQC, Trimmomatic, BWA-MEM, Samtools, Bedtools, DeepTools, and HOMER. Parameters are dynamically adjusted based on dataset characteristics such as sequencing layout and selected peak type (narrow or broad), while users can directly specify key parameters including reference genome, peak type, promoter region, and FDR threshold to customize the analysis according to their experimental conditions. The entire workflow is executed server-side, without the need for file uploads or manual software configuration.
H3NGST web interface for ChIP-seq submission. A Users enter a BioProject ID and assign a unique analysis nickname. B The system retrieves metadata from the NCBI SRA and presents associated SRR entries for sample selection, including optional control datasets. C Users configure analysis parameters such as reference genome, peak type (narrow or broad), FDR threshold, and promoter range. D A summary page provides a final review of the selections before starting the analysis
Result retrieval and visualization
Upon completion, users can retrieve results by entering their assigned nickname on the results access page (Supplementary Fig. 1A). The results page (Supplementary Fig. 1B) includes an analysis summary table with the analysis date, scheduled deletion date, and associated SRR identifiers, along with output file icons featuring tooltips and clickable links that provide direct access to format descriptions summarized in Table 2. In addition, for the trimming step, a dedicated summary table is provided, reporting key quality metrics such as the number of input reads, surviving reads, dropped reads, and survival percentage (Supplementary Fig. 1C). The platform provides real-time updates on the status of each submitted analysis. It also includes a per-sample analysis status table that visualizes the progress of each processing step in the actual pipeline order: QC → SAM → BAM → BED → BigWig → Motif Finding → Peak Finding. Each output type is marked with a checkmark icon to indicate successful generation, and users can click these icons to directly download the corresponding result files (Supplementary Fig. 1D, top). Putative target genes linked to the identified peaks are explicitly listed in the per-sample analysis status table, enabling direct and easy access to the top candidate genes associated with each dataset (Supplementary Fig. 1D, middle). For other steps such as peak finding, numerical results like the number of peaks can be retrieved from the "findPeaks.log" file located in the logs directory (Supplementary Fig. 1D, bottom). BigWig files can be directly visualized using the UCSC Genome Browser [30] for locus-specific signal inspection (Supplementary Fig. 1E), while the Integrative Genomics Viewer (IGV) [31] allows more detailed exploration of read alignments and enrichment profiles in a local environment (Supplementary Fig. 1F). Annotated peak tables include genomic coordinates, associated genes, distances to TSS, peak types, and enrichment scores, providing essential context for downstream interpretation (Supplementary Fig. 1G). User support features include real-time feedback submission, update notifications (Supplementary Fig. 1H), and error reporting forms (Supplementary Fig. 1I). All web traffic is encrypted using SSL/TLS, and the backend is securely deployed using Gunicorn behind a reverse proxy on DuckDNS.
Comparison with existing platforms
Several platforms offer web-based ChIP-seq analysis [19,20,21,22]. However, these tools typically require user login, manual file uploads, or partial workflows that depend on external software. Table 3 provides a comparative overview of the available platforms. Unlike most alternatives, H3NGST uniquely supports fully automated analysis starting from a public BioProject accession, without login credentials, data upload, or manual parameter tuning, significantly lowering the technical barrier for experimental researchers.
Current capabilities and future directions
H3NGST currently supports major human (hg18, hg19, hg38) and mouse (mm9, mm10, mm39) reference genomes, and is optimized for transcription factor and histone modification ChIP-seq analysis. To optimize performance, each analysis session currently supports up to four samples. Concurrent submissions are handled reliably through a queue-based backend system. Ongoing development efforts aim to extend the platform’s capabilities to include support for plant, insect, and other animal genomes, and to integrate additional analysis modules for: RNA-seq [32, 33], single-cell RNA-seq [34], and ATAC-seq [35, 36]. These additions will enable multi-omics integration and advance the platform toward comprehensive regulatory genomics studies. By unifying multiple high-throughput sequencing pipelines within a single automated framework, H3NGST is poised to support a broader range of biological applications, including epigenomic drug screening, chromatin state profiling, and transcriptional regulatory network inference [37].
Conclusion
H3NGST represents a significant advancement in ChIP-seq data analysis by fully automating the end-to-end workflow in an intuitive, web-based environment. By integrating raw data retrieval, quality control, alignment, peak calling, and annotation into a seamless pipeline, the platform removes many of the technical barriers traditionally associated with bioinformatics tools. Users can perform comprehensive analyses without command-line expertise, file uploads, or software installation, and all results are securely delivered through an installation-free interface. The system’s support for dynamic parameter configuration, such as reference genome selection and promoter window size, further enhances its flexibility in different experimental contexts. In addition, its compatibility with mobile devices and lack of user authentication requirements improve accessibility for researchers with varying levels of computational expertise. By promoting reproducibility, scalability, and ease of use, H3NGST serves as a practical solution for large-scale epigenomic studies and broadens the access to high-throughput analysis of the regulatory genome.
Requirements
Project name: H3NGST (Hybrid, High-throughput, and High-resolution NGS Toolkit).
Project home page: https://ngschiphhh.duckdns.org
Operating system: Platform independent (web-based).
Programming language: Python, Bash.
Other requirements: Modern web browser (e.g., Google Chrome); backend server runs on Gunicorn; SSL/TLS secured via DuckDNS for free domain and certificate management.
License: Restricted academic use only.
Any restrictions to use by non-academics: Yes (academic use only).
Data availability
The datasets used in this study are available under GEO accession GSE26439 (BioProject PRJNA136821), including runs SRR090229 and SRR090230.
References
Chen X, Xu H, Shu X, Song CX. Mapping epigenetic modifications by sequencing technologies. Cell Death Differ. 2025;32(1):56–65.
Sarda S, Hannenhalli S. Next-generation sequencing and epigenomics research: a hammer in search of nails. Genomics Inform. 2014;12(1):2–11.
Satam H, Joshi K, Mangrolia U, Waghoo S, Zaidi G, Rawool S, et al. Next-generation sequencing technology: current trends and advancements. Biology (Basel). 2023. https://doi.org/10.3390/biology12070997.
Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, et al. The accessible chromatin landscape of the human genome. Nature. 2012;489(7414):75–82.
Yamashita Y, Shimada M, Harimoto N, Rikimaru T, Shirabe K, Tanaka S, Sugimachi K. Histone deacetylase inhibitor trichostatin A induces cell-cycle arrest/apoptosis and hepatocyte differentiation in human hepatoma cells. Int J Cancer. 2003;103(5):572–6.
Yu T, Wang Y, Hu Q, Wu W, Wu Y, Wei W, et al. The EZH2 inhibitor GSK343 suppresses cancer stem-like phenotypes and reverses mesenchymal transition in glioma cells. Oncotarget. 2017;8(58):98348–59.
Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, et al. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129(4):823–37.
Ernst J, Kellis M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat Biotechnol. 2010;28(8):817–25.
Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, et al. Landscape of transcription in human cells. Nature. 2012;489(7414):101–8.
Landt SG, Marinov GK, Kundaje A, Kheradpour P, Pauli F, Batzoglou S, et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 2012;22(9):1813–31.
Leinonen R, Sugawara H, Shumway M, International Nucleotide Sequence Database C. The sequence read archive. Nucleic Acids Res. 2011;39(Database issue):D19-21.
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60.
Yandell M, Ence D. A beginner’s guide to eukaryotic genome annotation. Nat Rev Genet. 2012;13(5):329–42.
Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010;38(4):576–89.
Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9(9):R137.
Xu S, Grullon S, Ge K, Peng W. Spatial clustering for identification of ChIP-enriched regions (SICER) to map regions of histone methylation patterns in embryonic stem cells. Methods Mol Biol. 2014;1150:97–111.
Tran NT, Huang CH. A survey of motif finding web tools for detecting binding site motifs in ChIP-Seq data. Biol Direct. 2014;9:4.
Welch RP, Lee C, Imbriano PM, Patil S, Weymouth TE, Smith RA, et al. ChIP-Enrich: gene set enrichment testing for ChIP-seq data. Nucleic Acids Res. 2014;42(13):e105.
Goecks J, Nekrutenko A, Taylor J, Galaxy T. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010;11(8):R86.
Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP. GenePattern 2.0. Nat Genet. 2006;38(5):500–1.
Liu T, Ortiz JA, Taing L, Meyer CA, Lee B, Zhang Y, et al. Cistrome: an integrative platform for transcriptional regulation studies. Genome Biol. 2011;12(8):R83.
Li M, Tang L, Wu FX, Pan Y, Wang J. CSA: a web service for the complete process of ChIP-Seq analysis. BMC Bioinf 2019;20(Suppl 15):515.
Pique-Regi R, Degner JF, Pai AA, Gaffney DJ, Gilad Y, Pritchard JK. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 2011;21(3):447–55.
Li XY, Thomas S, Sabo PJ, Eisen MB, Stamatoyannopoulos JA, Biggin MD. The role of chromatin accessibility in directing the widespread, overlapping patterns of Drosophila transcription factor binding. Genome Biol. 2011;12(4):R34.
Brown J, Pirrung M, McCue LA. FQC dashboard: integrates FastQC results into a web-based, interactive, and extensible FASTQ quality control tool. Bioinformatics. 2017;33(19):3137–9.
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–9.
Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2.
Ramirez F, Dundar F, Diehl S, Gruning BA, Manke T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 2014;42:W187–91.
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. The human genome browser at UCSC. Genome Res. 2002;12(6):996–1006.
Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29(1):24–6.
Kanitz A, Gypas F, Gruber AJ, Gruber AR, Martin G, Zavolan M. Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data. Genome Biol. 2015;16(1):150.
Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL. Stringtie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33(3):290–5.
Stuart T, Satija R. Integrative single-cell analysis. Nat Rev Genet. 2019;20(5):257–72.
Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods. 2013;10(12):1213–8.
Buenrostro JD, Wu B, Litzenburger UM, Ruff D, Gonzales ML, Snyder MP, et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature. 2015;523(7561):486–90.
Neph S, Vierstra J, Stergachis AB, Reynolds AP, Haugen E, Vernot B, et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature. 2012;489(7414):83–90.
Funding
This work was supported in part by the National Research Foundation of Korea grant funded by the Korea government (MSIT) (RS-2025–00562288) and by Korea Basic Science Institute (National Research Facilities and Equipment Center) grant funded by the Ministry of Education (RS-2023-NF001356).
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Heo, H.H., Um, SJ. H3NGST: a fully automated, web-based platform for end-to-end ChIP-seq analysis. BMC Bioinformatics 26, 243 (2025). https://doi.org/10.1186/s12859-025-06247-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12859-025-06247-5
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative