Advanced Anti-Detection Web Scraping API with Comprehensive Fingerprinting Control
🎯 Unified Solution: Website + API on a single domain
🛡️ Advanced Anti-Detection: Canvas/WebGL/Audio spoofing, behavioral simulation
🧠 Human-like Behavior: Bezier mouse movements, keyboard dynamics, natural scrolling
🚀 Deploy Anywhere: Docker, Node.js+PM2, or Development
The future of intelligent web scraping is here
🎯 Revolutionary Features Coming:
- 🤖 AI-Powered Admin Panel - Intelligent task management & automation
- 🎨 Modern React Frontend - Sleek, responsive dashboard interface
- 🧠 Smart Automation - AI-driven scraping strategies & optimization
- 📊 Advanced Analytics - Real-time insights & performance metrics
- 🔄 Workflow Builder - Visual scraping pipeline creation
- 🎛️ Enterprise Controls - Advanced user management & permissions
Transform your web scraping experience with the next generation of HeadlessX
- Canvas Fingerprinting Control - Dynamic noise injection with consistent seeds
- WebGL Spoofing - GPU vendor/model spoofing with realistic profiles
- Audio Context Manipulation - Hardware audio fingerprint database
- WebRTC Leak Prevention - Complete IP leak protection
- Hardware Fingerprint Spoofing - CPU, memory, and performance masking
- Bezier Mouse Movement - Natural acceleration and deceleration patterns
- Keyboard Dynamics - Realistic dwell time and flight time variations
- Natural Scroll Patterns - Reader, scanner, browser behavioral profiles
- Attention Model Simulation - Human-like focus and interaction patterns
- Micro-movement Injection - Sub-pixel accuracy for maximum realism
- Cloudflare Bypass - Advanced challenge solving and TLS fingerprinting
- DataDome Evasion - Resource blocking and behavioral pattern matching
- Incapsula/Akamai - Generic WAF bypass with adaptive techniques
- HTTP/2 Fingerprinting - Stream prioritization and header ordering
- 50+ Chrome Profiles - Desktop, mobile, and tablet configurations
- Hardware Consistency - CPU, GPU, memory, and sensor correlation
- Geolocation Intelligence - Timezone, language, and locale matching
- Profile Validation - Real-time consistency checking and scoring
Choose your deployment:
| Method | Command | Best For |
|---|---|---|
| 🐳 Docker | docker-compose up -d |
Production, easy deployment |
| 🔧 Auto Setup | chmod +x scripts/setup.sh && sudo ./scripts/setup.sh |
VPS/Server with full control |
| 💻 Development | npm install && npm start |
Local development, testing |
Access your HeadlessX v1.3.0:
🌐 Website: https://your-subdomain.yourdomain.com
🔗 API: https://your-subdomain.yourdomain.com/api
🛡️ Stealth: https://your-subdomain.yourdomain.com/api/render/stealth
🧪 Testing: https://your-subdomain.yourdomain.com/api/test-fingerprint
📱 Profiles: https://your-subdomain.yourdomain.com/api/profiles
🔧 Health: https://your-subdomain.yourdomain.com/api/health
📊 Status: https://your-subdomain.yourdomain.com/api/status?token=YOUR_AUTH_TOKEN
HeadlessX v1.3.0 introduces advanced anti-detection capabilities with comprehensive fingerprinting control, behavioral simulation, and WAF bypass techniques while maintaining the modular architecture from v1.2.0.
- 🛡️ Advanced Anti-Detection: Canvas, WebGL, Audio, WebRTC fingerprinting control
- 🎭 Behavioral Simulation: Human-like mouse movement with Bezier curves and keyboard dynamics
- 🌐 WAF Bypass: Cloudflare, DataDome, and advanced evasion techniques
- 📱 Device Profiling: Comprehensive desktop and mobile device profiles with hardware spoofing
- 🧪 Testing Framework: Comprehensive anti-detection testing and validation
- 🔧 Separation of Concerns: Enhanced modules for fingerprinting, behavioral, and evasion services
- 🚀 Better Performance: Optimized browser management with intelligent profile-based pooling
- 🛠️ Developer Experience: Development tools, profile generators, and interactive testing
- 📦 Production Ready: Enhanced error handling, advanced detection analytics, and profile validation
- 🔒 Security: Advanced authentication, profile management, and secure fingerprint storage
- 📊 Monitoring: Real-time detection monitoring, success rate analytics, and performance benchmarking
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Routes │───▶│ Controllers │───▶│ Services │
│ (api.js) │ │ (rendering.js)│ │ (browser.js) │
│ (admin.js) │ │ (profiles.js) │ │ (stealth.js) │
└─────────────────┘ │ (detection.js)│ │ (interaction.js)
│ └─────────────────┘ └─────────────────┘
▼ │ │
┌─────────────────┐ ▼ ▼
│ Middleware │ ┌─────────────────┐ ┌─────────────────┐
│ (auth.js) │ │ Utils │ │ Config │
│ (error.js) │ │ (logger.js) │ │ (index.js) │
│ (analyzer.js) │ │ (helpers.js) │ │ (browser.js) │
└─────────────────┘ │ (validator.js)│ │ (profiles/) │
│ └─────────────────┘ └─────────────────┘
▼ │ │
┌─────────────────┐ ▼ ▼
│ Fingerprinting │ ┌─────────────────┐ ┌─────────────────┐
│ (canvas-spoof) │ │ Behavioral │ │ Evasion │
│ (webgl-spoof) │ │ (mouse-movement)│ │ (cloudflare) │
│ (audio-context) │ │ (keyboard-dyn) │ │ (datadome) │
│ (webrtc-ctrl) │ │ (scroll-pattern)│ │ (waf-bypass) │
│ (hardware-noise)│ │ (attention-mod) │ │ (tls-fingerpr) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Testing │ │ Development │ │ Profiles │
│ (test-framework)│ │ (dev-tools) │ │ (chrome-prof) │
│ (detection-test)│ │ (profile-gen) │ │ (mobile-prof) │
│ (performance) │ │ (fingerpr-test) │ │ (firefox-prof) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Migration from v1.2.0:
- All v1.2.0 functionality preserved with enhanced anti-detection capabilities
- New environment variables for fingerprint control and stealth configuration
- Enhanced API endpoints for profile management and detection testing
- Backward compatible with all existing configurations and scripts
📖 Detailed Documentation: MODULAR_ARCHITECTURE.md
# Install Docker (if needed) curl -fsSL https://get.docker.com | sh sudo usermod -aG docker $USER # Deploy HeadlessX git clone https://github.com/SaifyXPRO/HeadlessX.git cd HeadlessX cp .env.example .env nano .env # Configure DOMAIN, SUBDOMAIN, AUTH_TOKEN # Start services docker-compose up -d # Optional: Setup SSL apt install certbot python3-certbot-nginx certbot --nginx -d your-subdomain.yourdomain.com
Docker Management:
docker-compose ps # Check status docker-compose logs headlessx # View logs docker-compose restart # Restart services docker-compose down # Stop services
# Automated setup (recommended) git clone https://github.com/SaifyXPRO/HeadlessX.git cd HeadlessX cp .env.example .env nano .env # Configure environment chmod +x scripts/setup.sh sudo ./scripts/setup.sh # Installs dependencies, builds website, starts PM2
🌐 Nginx Configuration (Auto-handled by setup script):
The setup script automatically configures nginx, but if you need to manually configure:
# Copy and configure nginx site sudo cp nginx/headlessx.conf /etc/nginx/sites-available/headlessx # Replace placeholders with your actual domain sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/your-subdomain.yourdomain.com/g' /etc/nginx/sites-available/headlessx # Enable the site sudo ln -sf /etc/nginx/sites-available/headlessx /etc/nginx/sites-enabled/ sudo rm -f /etc/nginx/sites-enabled/default # Test and reload nginx sudo nginx -t && sudo systemctl reload nginx
Manual setup (if not using setup script):
sudo apt update && sudo apt upgrade -y curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash - sudo apt install -y nodejs build-essential npm install && npm run build sudo npm install -g pm2 npm run pm2:start
PM2 Management:
npm run pm2:status # Check status npm run pm2:logs # View logs npm run pm2:restart # Restart server npm run pm2:stop # Stop server
git clone https://github.com/SaifyXPRO/HeadlessX.git cd HeadlessX cp .env.example .env nano .env # Set AUTH_TOKEN, DOMAIN=localhost, SUBDOMAIN=headlessx # Make scripts executable chmod +x scripts/*.sh # Install dependencies npm install cd website && npm install && npm run build && cd .. # Start development server npm start # Access at http://localhost:3000
HeadlessX Routes:
├── /favicon.ico → Favicon
├── /robots.txt → SEO robots file
├── /api/health → Health check (no auth required)
├── /api/status → Server status (requires token)
├── /api/render → Full page rendering
├── /api/html → HTML extraction
├── /api/content → Clean text extraction
├── /api/screenshot → Screenshot generation
├── /api/pdf → PDF generation
└── /api/batch → Batch URL processing
🔄 Request Flow:
- Nginx receives request on port 80/443
- Proxies to Node.js server on port 3000
- Server routes based on path:
/api/*→ API endpoints/*→ Website files (built Next.js app)
curl https://your-subdomain.yourdomain.com/api/health
curl -X POST "https://your-subdomain.yourdomain.com/api/render/stealth?token=YOUR_AUTH_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "profile": "desktop-chrome", "stealthMode": "maximum", "behaviorSimulation": true, "timeout": 30000 }'
curl -X POST "https://your-subdomain.yourdomain.com/api/render?token=YOUR_AUTH_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "profile": "iphone-14-pro", "geolocation": {"latitude": 40.7128, "longitude": -74.0060}, "behaviorSimulation": true }'
curl -X POST "https://your-subdomain.yourdomain.com/api/test-fingerprint?token=YOUR_AUTH_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "profile": "desktop-chrome", "testCanvas": true, "testWebGL": true, "testAudio": true }'
curl "https://your-subdomain.yourdomain.com/api/profiles?token=YOUR_AUTH_TOKEN"curl -X POST "https://your-subdomain.yourdomain.com/api/render?token=YOUR_AUTH_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "profile": "desktop-firefox", "cloudflareBypass": true, "datadomeBypass": true, "mouseMovement": "natural", "keyboardDynamics": "human", "timeout": 45000 }'
curl -X POST "https://your-subdomain.yourdomain.com/api/html?token=YOUR_AUTH_TOKEN" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com", "timeout": 30000}'
curl "https://your-subdomain.yourdomain.com/api/screenshot?token=YOUR_AUTH_TOKEN&url=https://example.com&fullPage=true" \
-o screenshot.pngcurl -X POST "https://your-subdomain.yourdomain.com/api/text?token=YOUR_AUTH_TOKEN" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com", "waitForSelector": "main"}'
curl -X POST "https://your-subdomain.yourdomain.com/api/pdf?token=YOUR_AUTH_TOKEN" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com", "format": "A4"}' \ -o document.pdf
HTTP Request Module Configuration:
{
"url": "https://your-subdomain.yourdomain.com/api/html",
"method": "POST",
"headers": {
"Content-Type": "application/json"
},
"qs": {
"token": "YOUR_AUTH_TOKEN"
},
"body": {
"url": "{{url_to_scrape}}",
"timeout": 30000,
"waitForSelector": "{{optional_selector}}"
}
}Webhooks by Zapier Setup:
- URL:
https://your-subdomain.yourdomain.com/api/html?token=YOUR_AUTH_TOKEN - Method: POST
- Headers:
Content-Type: application/json - Body:
{
"url": "{{url_from_trigger}}",
"timeout": 30000,
"humanBehavior": true
}HTTP Request Node:
{
"url": "https://your-subdomain.yourdomain.com/api/html",
"method": "POST",
"authentication": "queryAuth",
"query": {
"token": "YOUR_AUTH_TOKEN"
},
"headers": {
"Content-Type": "application/json"
},
"body": {
"url": "={{$json.url}}",
"timeout": 30000,
"humanBehavior": true
}
}Available via n8n Community Node:
- Install:
npm install n8n-nodes-headlessx - GitHub Repository
import requests def scrape_with_headlessx(url, token): response = requests.post( "https://your-subdomain.yourdomain.com/api/html", params={"token": token}, json={ "url": url, "timeout": 30000, "humanBehavior": True } ) return response.json() # Usage result = scrape_with_headlessx("https://example.com", "YOUR_TOKEN") print(result['html'])
const axios = require('axios'); async function scrapeWithHeadlessX(url, token) { try { const response = await axios.post( `https://your-subdomain.yourdomain.com/api/html?token=${token}`, { url: url, timeout: 30000, humanBehavior: true } ); return response.data; } catch (error) { console.error('Scraping failed:', error.message); throw error; } } // Usage scrapeWithHeadlessX('https://example.com', 'YOUR_TOKEN') .then(result => console.log(result.html)) .catch(error => console.error(error));
curl -X POST "https://your-subdomain.yourdomain.com/api/batch?token=YOUR_AUTH_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "urls": [ "https://example1.com", "https://example2.com", "https://example3.com" ], "timeout": 30000, "humanBehavior": true }'
curl -X POST "https://your-subdomain.yourdomain.com/api/batch?token=YOUR_AUTH_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "urls": ["https://example.com", "https://httpbin.org"], "format": "text", "options": {"timeout": 30000} }'
HeadlessX v1.3.0 - Enhanced Anti-Detection Architecture/
├── 📂 src/ # Modular application source
│ ├── 📂 config/ # Configuration management
│ │ ├── index.js # Main configuration loader
│ │ └── browser.js # Browser-specific settings
│ ├── 📂 utils/ # Utility functions
│ │ ├── errors.js # Error handling & categorization
│ │ ├── logger.js # Structured logging
│ │ └── helpers.js # Common utilities
│ ├── 📂 services/ # Business logic services
│ │ ├── browser.js # Browser lifecycle management
│ │ ├── stealth.js # Anti-detection techniques
│ │ ├── interaction.js # Human-like behavior
│ │ └── rendering.js # Core rendering logic
│ ├── 📂 middleware/ # Express middleware
│ │ ├── auth.js # Authentication
│ │ └── error.js # Error handling
│ ├── 📂 controllers/ # Request handlers
│ │ ├── system.js # Health & status endpoints
│ │ ├── rendering.js # Main rendering endpoints
│ │ ├── batch.js # Batch processing
│ │ └── get.js # GET endpoints & docs
│ ├── 📂 routes/ # Route definitions
│ │ ├── api.js # API route mappings
│ │ └── static.js # Static file serving
│ ├── app.js # Main application setup
│ ├── server.js # Entry point for PM2
│ └── rate-limiter.js # Rate limiting implementation
├── 📂 website/ # Next.js website (unchanged)
│ ├── app/ # Next.js 13+ app directory
│ ├── components/ # React components
│ ├── .env.example # Website environment template
│ ├── next.config.js # Next.js configuration
│ └── package.json # Website dependencies
├── 📂 scripts/ # Deployment & management scripts
│ ├── setup.sh # Automated installation (updated)
│ ├── update_server.sh # Server update script (updated)
│ ├── verify-domain.sh # Domain verification
│ └── test-routing.sh # Integration testing
├── 📂 nginx/ # Nginx configuration
│ └── headlessx.conf # Nginx proxy config
├── 📂 docker/ # Docker deployment (updated)
│ ├── Dockerfile # Container definition
│ └── docker-compose.yml # Docker Compose setup
├── ecosystem.config.js # PM2 configuration (moved to root)
├── .env.example # Environment template (updated)
├── package.json # Server dependencies (updated)
├── docs/
│ └── MODULAR_ARCHITECTURE.md # Architecture documentation
└── README.md # This file
# 1. Install dependencies npm install # 2. Build website cd website npm install npm run build cd .. # 3. Set environment variables export AUTH_TOKEN="development_token_123" export DOMAIN="localhost" export SUBDOMAIN="headlessx" # 4. Start server npm start # Uses src/app.js # 5. Access locally # Website: http://localhost:3000 # API: http://localhost:3000/api/health
# Test server and website integration bash scripts/test-routing.sh localhost # Test with environment variables bash scripts/verify-domain.sh
Create your .env file from the template:
cp .env.example .env nano .env
Required configuration:
# Security Token (Generate a secure random string) AUTH_TOKEN=your_secure_token_here # Domain Configuration DOMAIN=yourdomain.com SUBDOMAIN=headlessx # Optional: Browser Settings BROWSER_TIMEOUT=60000 MAX_CONCURRENT_BROWSERS=5 # Optional: Server Settings PORT=3000 NODE_ENV=production
Option 1: Automatic (Recommended)
# The setup script automatically replaces domain placeholders
sudo ./scripts/setup.shOption 2: Manual Configuration
# Copy nginx configuration sudo cp nginx/headlessx.conf /etc/nginx/sites-available/headlessx # Replace domain placeholders (replace with your actual domain) sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/headlessx.yourdomain.com/g' /etc/nginx/sites-available/headlessx # Example: If your domain is "api.example.com" sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/api.example.com/g' /etc/nginx/sites-available/headlessx # Enable site and reload nginx sudo ln -sf /etc/nginx/sites-available/headlessx /etc/nginx/sites-enabled/ sudo nginx -t && sudo systemctl reload nginx
Your final URLs will be:
- Website:
https://your-subdomain.yourdomain.com - API Health:
https://your-subdomain.yourdomain.com/api/health - API Endpoints:
https://your-subdomain.yourdomain.com/api/*
| Endpoint | Method | Description | Auth Required |
|---|---|---|---|
/api/health |
GET | Health check | ❌ |
/api/status |
GET | Server status | ✅ |
/api/render |
POST | Full page rendering (JSON) | ✅ |
/api/html |
GET/POST | Raw HTML extraction | ✅ |
/api/content |
GET/POST | Clean text extraction | ✅ |
/api/screenshot |
GET | Screenshot generation | ✅ |
/api/pdf |
GET | PDF generation | ✅ |
/api/batch |
POST | Batch URL processing | ✅ |
All endpoints (except /api/health) require a token via:
- Query parameter:
?token=YOUR_TOKEN - Header:
X-Token: YOUR_TOKEN - Header:
Authorization: Bearer YOUR_TOKEN
Visit your HeadlessX website for full API documentation with examples, or check:
curl https://your-subdomain.yourdomain.com/api/health
curl "https://your-subdomain.yourdomain.com/api/status?token=YOUR_TOKEN"# PM2 logs npm run pm2:logs pm2 logs headlessx --lines 100 # Docker logs docker-compose logs -f headlessx # Nginx logs sudo tail -f /var/log/nginx/access.log
git pull origin main npm run build # Rebuild website npm run pm2:restart # PM2 # OR docker-compose restart # Docker
"npm ci" Error (missing package-lock.json):
chmod +x scripts/generate-lockfiles.sh ./scripts/generate-lockfiles.sh # Generate lock files # OR npm install --production # Use install instead
"Cannot find module 'express'":
npm install # Install dependenciesSystem dependency errors (Ubuntu):
sudo apt update && sudo apt install -y \
libatk1.0-0t64 libatk-bridge2.0-0t64 libcups2t64 \
libatspi2.0-0t64 libasound2t64 libxcomposite1PM2 not starting:
sudo npm install -g pm2 chmod +x scripts/setup.sh # Make script executable pm2 start config/ecosystem.config.js pm2 logs headlessx # Check errors
Script permission errors:
# Make all scripts executable chmod +x scripts/*.sh # Or use the quick setup chmod +x scripts/quick-setup.sh && ./scripts/quick-setup.sh
Playwright browser installation errors:
# Use dedicated Playwright setup script chmod +x scripts/setup-playwright.sh ./scripts/setup-playwright.sh # Or install manually: sudo apt update && sudo apt install -y \ libgtk-3-0t64 libpangocairo-1.0-0 libcairo-gobject2 \ libgdk-pixbuf-2.0-0 libdrm2 libxss1 libxrandr2 \ libasound2t64 libatk1.0-0t64 libnss3 # Install only Chromium (most stable) npx playwright install chromium # Alternative: Use Docker (avoids dependency issues) docker-compose up -d
- Token Authentication: Secure API access with custom tokens
- Rate Limiting: Nginx-level request throttling
- Security Headers: XSS, CSRF, and clickjacking protection
- Bot Protection: Common attack vector blocking
- SSL/TLS: Automatic HTTPS with Let's Encrypt
We welcome contributions from the community! Whether you're fixing bugs, adding features, improving documentation, or sharing ideas, your input is valuable.
- 🐛 Report Bugs: Create a bug report
- 💡 Suggest Features: Share your ideas
- 📖 Improve Docs: Help make our documentation better
- 💻 Submit Code: Fork, code, and create a pull request
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Please read CONTRIBUTING.md for detailed guidelines.
Join our growing community of developers, data scientists, and automation enthusiasts!
- ❓ Ask Questions: Stuck on something? Start a Q&A discussion
- 💡 Share Ideas: Have a feature idea? Create an idea post
- 🎨 Showcase: Built something cool? Show it off!
- 📢 Announcements: Stay updated with the latest news
- Be respectful and inclusive
- Help others learn and grow
- Share knowledge and experiences
- Report issues constructively
- Follow our Code of Conduct
This project is licensed under the MIT License - see the LICENSE file for details.
| Resource | Description | Link |
|---|---|---|
| 📖 Documentation | Complete API reference & guides | View Docs |
| 🐛 Bug Reports | Found a bug? Report it here | Report Bug |
| 💡 Feature Requests | Suggest new features | Request Feature |
| 🔒 Security | Report security vulnerabilities | Security Policy |
| 💬 Discussions | Community Q&A & discussions | Join Discussions |
| 📊 Project Board | Track development progress | View Board |
| 📝 Changelog | See what's new | View Changes |
| 🗺️ Roadmap | Future plans & features | View Roadmap |
- Installation Help: SETUP.md
- Troubleshooting: TROUBLESHOOTING.md
- API Documentation: GET_ENDPOINTS.md | POST_ENDPOINTS.md
- Architecture Guide: MODULAR_ARCHITECTURE.md
- Ethics & Responsible Use: ETHICS.md
HeadlessX v1.3.0 - The most advanced open-source anti-detection web scraping solution.
GitHub stars GitHub forks GitHub watchers
Made with 🚀 by developers, for developers
Made with ❤️ for the developer community.