ABDoc Banner License Python Platform
ABDoc is a comprehensive website documentation downloader and archiver developed by AeonBridge Co. This professional-grade toolkit enables you to recursively download entire websites or documentation sites for offline access, knowledge preservation, compliance auditing, and strategic content archival.
- π Recursive Website Downloading - Complete site mirroring with all assets
- π Dual-Format Output - HTML preservation + Markdown conversion
- π Smart Link Conversion - Automatic offline navigation setup
- π₯οΈ Local Server Generation - One-click local hosting with Python HTTP server
- π Cross-Platform Support - Linux, macOS, and Windows (WSL)
- β‘ High-Quality Conversion - Powered by IBM's docling library
- π‘οΈ Enterprise-Ready - Built for compliance, auditing, and knowledge management
- π€ Intelligent Crawling - Respectful delays, retry logic, and user-agent simulation
| Tool | Purpose | Best For |
|---|---|---|
ab_downloader_html2md.sh |
Full site download + Markdown conversion | Documentation archival |
ab_download_only.sh |
HTML-only site mirroring | Quick offline access |
simple_downloader.py |
Python fallback downloader | Cross-platform compatibility |
website_downloader.sh |
Streamlined wget wrapper | Simple site downloads |
- π Evolution API v2 Documentation (
scripts/wget_/samples/evolution_v2/) - π§ n8n Workflow Documentation (
scripts/wget_/samples/n8n/)
# System requirements
- Bash shell (Linux/macOS/WSL)
- wget (primary download engine)
- Python 3.12+ (for server and fallback functionality)# Clone the repository git clone https://github.com/aeonbridge/ABDoc.git cd ABDoc # Install Python dependencies (for HTMLβMarkdown conversion) uv install # OR pip install docling requests beautifulsoup4 # Make scripts executable chmod +x scripts/*.sh chmod +x scripts/wget_/*.sh
# Download and convert to Markdown ./scripts/wget_/ab_downloader_html2md.sh <URL> <output_folder> # Example: Archive Django documentation ./scripts/wget_/ab_downloader_html2md.sh https://docs.djangoproject.com/ django_docs
Output Structure:
django_docs/
βββ html/ # Complete HTML mirror
β βββ docs.djangoproject.com/
βββ md/ # Markdown conversions
β βββ index.md
β βββ tutorial.md
β βββ ...
βββ launch_server.py
# Quick HTML mirroring ./scripts/wget_/ab_download_only.sh <URL> # Example: Mirror React documentation ./scripts/wget_/ab_download_only.sh https://react.dev/
# When wget is unavailable
python src/simple_downloader.py# Guided download process ./scripts/sd.sh # OR specify URL directly ./scripts/sd.sh https://your-target-site.com
After downloading, start the local server:
cd <output_directory> python launch_server.py # OR ./launch.sh # Then visit: http://localhost:8000
Edit the scripts to modify download behavior:
# Common modifications in ab_download_only.sh --wait=2 # Increase delay between requests --limit-rate=200k # Limit download speed --accept="html,css,js,png" # Only download specific file types --exclude-directories=admin # Skip certain directories
The ab_downloader_html2md.sh script uses IBM's docling library for high-quality conversion:
# Customize docling behavior in the script docling <html_file> --to=markdown --output=<md_file>
β Excellent Support:
- Technical documentation sites (GitBook, MkDocs, Sphinx)
- API documentation (OpenAPI, REST docs)
- Knowledge bases and wikis
- Static content sites
- Single-page applications (SPA)
- JavaScript-heavy dynamic sites
- Authentication-required content
- Streaming or real-time content
wget not found:
# Install wget # Ubuntu/Debian: sudo apt-get install wget # macOS: brew install wget # Or use Python fallback: python src/simple_downloader.py
Permission denied:
chmod +x scripts/*.sh chmod +x scripts/wget_/*.sh
Large sites timing out:
# Increase timeout in script
--timeout=60
--tries=5- π Documentation Archival - Preserve technical knowledge for offline access
- π Compliance & Auditing - Archive content for regulatory requirements
- π Training & Education - Create offline training materials
- π’ Enterprise Knowledge Management - Centralize critical documentation
- π¬ Research & Analysis - Systematic content collection and analysis
- π Digital Preservation - Long-term content preservation initiatives
We welcome contributions! Here's how you can help:
- π Report Bugs - Open an issue with detailed reproduction steps
- π‘ Suggest Features - Share your ideas for new functionality
- π§ Submit Pull Requests - Contribute code improvements
- π Improve Documentation - Help us make ABDoc more accessible
git clone https://github.com/aeonbridge/ABDoc.git cd ABDoc uv install --dev pre-commit install # If using pre-commit hooks
This project is licensed under the MIT License - see the LICENSE file for details.
PROVIDED "AS IS" - NO EXTENDED SUPPORT
- β Open Source & Free - Use freely for any purpose
- β No Warranty - Software provided without any guarantees
- β Limited Support - Community-driven support only
- β No SLA - No service level commitments
- βοΈ User Responsibility - Ensure compliance with target site terms of service
- π Respect Robots.txt - Consider ethical crawling practices
- Respect Copyright - Only download content you have permission to access
- Follow Terms of Service - Comply with target website policies
- Be Respectful - Use appropriate delays and don't overload servers
- Legal Compliance - Ensure your use case complies with applicable laws
Strategic Innovation in Data, Knowledge & Artificial Intelligence
Transforming information into intelligence and connecting you to the future.
AeonBridge Co. is dedicated to developing innovative solutions for knowledge management, data intelligence, and AI-powered automation. ABDoc represents our commitment to open-source tools that empower organizations to preserve and leverage their critical information assets.
π Learn More: www.aeonbridge.co
- π§ General Questions - Open a GitHub Discussion
- π Bug Reports - Create a GitHub Issue
- π‘ Feature Requests - Submit via GitHub Issues
- π€ Enterprise Inquiries - Contact AeonBridge Co. directly
Special thanks to:
- IBM Research - For the excellent docling library
- GNU wget team - For the powerful downloading engine
- Python community - For the robust ecosystem
- Open Source Contributors - For inspiration and best practices
π Star this repository if ABDoc helps you preserve knowledge! π
Last updated: 2024 | Version: 1.0.0 | Maintained by AeonBridge Co.