Build Status TypeScript Node.js License
Apex Data API is a comprehensive motorsports data acquisition and API server that combines web scraping, GraphQL APIs, and PostgreSQL data storage. The system handles historical racing data from multiple eras and series, providing both scraping capabilities and API access to race results, driver statistics, team information, and broadcast metrics.
- 🏎️ Motorsports Data Scraping: Automated scraping of race results, driver stats, and broadcast metrics
- 📊 GraphQL API: Comprehensive API with optimized queries and DataLoaders
- 🗄️ PostgreSQL Integration: Advanced database pooling and migration system
- 🔒 Security & Validation: Zod-based validation with comprehensive security middleware
- ⚡ Performance: DataLoaders prevent N+1 queries, Redis caching, connection pooling
- 📈 Monitoring: Health checks, performance metrics, structured logging
- 🧪 Testing: Comprehensive test suites with Jest and custom motorsports data matchers
- Historical Results: Pre-2000s through modern era (2024+)
- Race Data: Qualifying, practice, race results with detailed metrics
- Driver Statistics: Career stats, team affiliations, performance data
- Broadcast Metrics: TV coverage, sponsor visibility, ROI analysis
- Financial Data: Sponsorship deals, team budgets, revenue tracking
- Performance Analytics: Telemetry, pit stops, strategy analysis
apex-api-archivist/
├── src/ # Source code
│ ├── server.ts # Main API server (Express + Apollo)
│ ├── consolidated-scraper.ts # Unified scraping orchestrator
│ ├── base/ # Base classes and managers
│ │ ├── base-scraper.ts # Abstract scraper foundation
│ │ ├── scraper-manager.ts # Scraper orchestration
│ │ ├── config-manager.ts # Configuration management
│ │ └── schema-manager.ts # Database schema management
│ ├── database/ # Database layer
│ │ └── advanced-pool.ts # Connection pooling & monitoring
│ ├── middleware/ # Express middleware
│ │ ├── enhanced-validation.ts # Zod validation middleware
│ │ ├── security-middleware.ts # Security & rate limiting
│ │ ├── correlation-id.ts # Request correlation
│ │ └── performance-monitoring.ts # Performance tracking
│ ├── validation/ # Schema validation
│ │ └── schemas.ts # Zod validation schemas
│ ├── queue/ # Background job processing
│ │ └── job-processor.ts # Bull queue management
│ ├── cache/ # Caching layer
│ │ └── redis-cache.ts # Redis caching implementation
│ └── scrapers/ # Individual scraper implementations
│ ├── performance-data-scraper.ts
│ └── broadcast-metrics-scraper.ts
├── tests/ # Test suites
│ ├── unit/ # Unit tests
│ ├── integration/ # Integration tests
│ ├── api/ # API endpoint tests
│ └── database/ # Database operation tests
├── migrations/ # Database migrations & seeds
├── docs/ # Documentation
│ ├── CLEANUP_REPORT.md # Recent codebase cleanup
│ ├── ZOD_MIGRATION_COMPLETE.md # Validation migration report
│ └── SECURITY_AUDIT.md # Security review
└── scripts/ # Deployment & utility scripts
└── deploy-advanced-scrapers.sh
- Node.js 24+
- PostgreSQL 17+
- Redis 6+ (for caching and job queues)
- pnpm (preferred package manager)
- Docker & Docker Compose (for containerized development)
gh repo clone sbaldwin24/apex-api-archivist
cd apex-api-archivist
pnpm installCopy the example environment files:
cp .env.development.example .env cp .env.production.example .env.production
Update .env with your configuration:
# Database Configuration DB_HOST=localhost DB_PORT=5432 DB_USER=postgres DB_PASSWORD=your_secure_password DB_NAME=apex_data # Redis Configuration REDIS_HOST=localhost REDIS_PORT=6379 # API Configuration PORT=3000 NODE_ENV=development LOG_LEVEL=info # Security JWT_SECRET=your-256-bit-jwt-secret-key API_KEY_SECRET=your-api-key-encryption-secret
# Run migrations pnpm run migrate:up # Seed with initial data (optional) pnpm run seed:run # Check status pnpm run migrate:status
# Start API development server pnpm run dev:api # Or build and start production server pnpm run build pnpm run start:api
The API will be available at http://localhost:3000
- GraphQL Playground:
http://localhost:3000/graphql - Health Check:
http://localhost:3000/health - API Documentation:
http://localhost:3000/docs
The modern scraping system uses a unified orchestrator:
# List available scrapers pnpm run scrape:list # Run all enabled scrapers pnpm run scrape # Run specific scraper pnpm run scrape broadcast-metrics # Run in parallel with concurrency control pnpm run scrape:parallel --concurrency 3 # Validate environment and setup pnpm run scrape:validate # Setup database schemas pnpm run scrape:setup-schemas
broadcast-metrics: TV coverage, sponsor visibility, ROI dataperformance-data: Telemetry, lap times, pit stop analysisfinancial-data: Sponsorship deals, team budgetscomplete-roi: Advanced ROI analysis with predictive modeling
Scrapers are configured via environment variables:
# Enable/disable specific scrapers ENABLE_BROADCAST_SCRAPER=true ENABLE_PERFORMANCE_SCRAPER=true ENABLE_FINANCIAL_SCRAPER=false # Scraping parameters SCRAPE_CONCURRENCY=3 SCRAPE_DELAY=2000 SCRAPE_TIMEOUT=30000 # Proxy configuration (optional) ENABLE_PROXY_ROTATION=false PROXY_LIST_URL=https://proxy-service.com/list
# Run all tests pnpm run test # Run specific test categories pnpm run test:unit # Base classes and utilities pnpm run test:api # API endpoint tests pnpm run test:integration # Integration tests pnpm run test:database # Database operation tests # Coverage reporting pnpm run test:coverage pnpm run test:coverage:open # Opens HTML report # Development pnpm run test:watch # Watch mode pnpm run test:debug # Debug mode
- Custom NASCAR matchers:
toBeValidRaceResult(),toHaveValidDriverData() - Database isolation: Each test runs with clean database state
- API testing: Supertest integration for endpoint testing
- Mock factories: Generate realistic NASCAR test data
- Coverage thresholds: 40% minimum coverage enforced
- Zod validation: Type-safe schema validation throughout
- Rate limiting: Configurable per-endpoint rate limits
- Security headers: Helmet.js integration
- Input sanitization: XSS and injection protection
- CORS configuration: Configurable origin whitelist
.gitignoreenhanced: Comprehensive secret protection- Environment templates: Safe
.examplefiles for setup - Secret management: Environment variable based configuration
- Security audit: Regular vulnerability scanning
# Verify .gitignore patterns ./verify-gitignore.sh # Run security linting pnpm run lint:security # Check for vulnerabilities pnpm run audit
# Build and run with Docker Compose docker-compose up -d # Production deployment docker-compose -f docker-compose.prod.yml up -d
# Build application pnpm run build # Run database migrations pnpm run migrate:up # Start production server NODE_ENV=production pnpm run start:api
Use the automated deployment script:
# Full deployment with validation ./scripts/deploy-advanced-scrapers.sh deploy # Validate environment only ./scripts/deploy-advanced-scrapers.sh validate # Monitor deployment ./scripts/deploy-advanced-scrapers.sh monitor # Rollback if needed ./scripts/deploy-advanced-scrapers.sh rollback
- Basic health:
GET /health - Detailed health:
GET /health/detailed - Performance metrics:
GET /metrics/performance - Database metrics:
GET /metrics/database - Queue status:
GET /admin/queues
Structured logging with correlation IDs:
/** Automatic request correlation */ logger.info('Processing race data', 'scraper', { raceId: 'daytona-500-2024', driverCount: 40, correlationId: 'auto-generated' });
- Connection pooling: Advanced PostgreSQL connection management
- DataLoaders: N+1 query prevention for GraphQL
- Redis caching: Configurable TTL and cache strategies
- Query optimization: Indexed queries and performance monitoring
# Linting and formatting pnpm run lint # Fix linting issues pnpm run lint:check # Check without fixing pnpm run format # Format code pnpm run format:check # Check formatting
# Migrations pnpm run migrate:create # Create new migration pnpm run migrate:up # Apply migrations pnpm run migrate:down # Rollback migrations pnpm run migrate:status # Check migration status pnpm run migrate:reset # Reset database (careful!) # Seeds pnpm run seed:run # Run all seeders pnpm run seed:status # Check seed status pnpm run seed:reset # Reset seed data
pnpm run redis:status # Check Redis status pnpm run redis:clear-cache # Clear application cache pnpm run redis:monitor # Monitor Redis activity
- GraphQL Playground: Available at
/graphqlin development - REST API Docs: Auto-generated Swagger UI at
/docs - Schema Documentation: Generated from code comments
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Write tests for your changes
- Ensure all tests pass (
pnpm test) - Lint your code (
pnpm lint) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- TypeScript: Strict typing required
- Testing: Comprehensive test coverage
- Validation: Use Zod for all schema validation
- Logging: Structured logging with correlation IDs
- Security: Follow security best practices
- Documentation: Update docs with significant changes
This project is licensed under the ISC License - see the LICENSE file for details.
- Cup Series: Top-level NASCAR competition
- Xfinity Series: Second-tier series
- Truck Series: NASCAR Camping World Truck Series
- Classic Era: Pre-2000s NASCAR data
- Pre-Playoff Era: 2000-2013 season data
- Modern Era: 2014+ with playoffs and stage racing
- Current Season: Real-time 2024+ data
- Race Results: Finish positions, points, laps led
- Qualifying: Grid positions, speeds, lap times
- Practice: Session times, speed charts
- Standings: Points standings throughout season
- Driver Stats: Career statistics and achievements
- Team Data: Manufacturer, crew chief, team affiliations
- Track Information: Layout, banking, surface details
- Weather: Race day conditions and impact
Built with ❤️ for NASCAR fans and data enthusiasts
For support or questions, please open an issue on GitHub.