-
-
Notifications
You must be signed in to change notification settings - Fork 46
Advanced Bot Detection Heuristics #209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
Draft
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
d92f83c
chore: wip
harlan-zw e0994ca
chore: progress commit
harlan-zw c83afbb
chore: progress
harlan-zw 8e39812
chore: progress commit
harlan-zw cf53c47
Merge main into feat/bot-tracker - sync with latest bot detection fea...
harlan-zw 287249e
Fix TypeScript compilation errors after merge
harlan-zw b7002bb
Minimize PR: Remove duplicated bot detection utilities
harlan-zw 8c4c9e1
chore: progress
harlan-zw b1c8e0f
Implement major bot detection improvements: performance, security, de...
harlan-zw 49b2cd2
Fix TypeScript compilation issues and test runtime dependencies
harlan-zw 69cb31d
chore: progress
harlan-zw 2995289
chore: missing files
harlan-zw File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
9 changes: 9 additions & 0 deletions
.playground/pages/index.vue
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,18 @@ | ||
| <script lang="ts" setup> | ||
| import { useBotDetection } from '#robots/app/composables/useBotDetection' | ||
|
|
||
| const bot = useBotDetection() | ||
| </script> | ||
|
|
||
| <template> | ||
| <div> | ||
| <div> | ||
| <NuxtLink to="/secret"> | ||
| Secret page - not crawlable | ||
| </NuxtLink> | ||
| <div> | ||
| Is Bot: {{ bot }} | ||
| </div> | ||
| </div> | ||
| </div> | ||
| </template> |
61 changes: 61 additions & 0 deletions
CLAUDE.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| # CLAUDE.md | ||
|
|
||
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | ||
|
|
||
| ## Development Commands | ||
|
|
||
| - **Build**: `pnpm build` - Builds the module using nuxt-module-build and generates client | ||
| - **Development**: `pnpm dev` - Runs playground at `.playground` directory | ||
| - **Development Preparation**: `pnpm dev:prepare` - Prepares development environment with stub build | ||
| - **Test**: `pnpm test` - Runs vitest test suite | ||
| - **Lint**: `pnpm lint` - Runs ESLint with auto-fix using @antfu/eslint-config | ||
| - **Type Check**: `pnpm typecheck` - Runs TypeScript compiler for type checking | ||
| - **Client Development**: `pnpm client:dev` - Runs devtools UI client on port 3300 | ||
| - **Release**: `pnpm release` - Builds, bumps version, and publishes | ||
|
|
||
| ## Architecture Overview | ||
|
|
||
| This is a Nuxt module (`@nuxtjs/robots`) that provides robots.txt generation and robot meta tag functionality for Nuxt applications. | ||
|
|
||
| ### Core Module Structure | ||
|
|
||
| - **`src/module.ts`**: Main module entry point with module options and setup logic | ||
| - **`src/runtime/`**: Runtime code that gets injected into user applications | ||
| - **`app/`**: Client-side runtime (composables, plugins) | ||
| - **`server/`**: Server-side runtime (middleware, routes, composables) | ||
| - **`src/kit.ts`**: Utilities for build-time module functionality | ||
| - **`src/util.ts`**: Shared utilities exported to end users | ||
|
|
||
| ### Key Runtime Components | ||
|
|
||
| - **Server Routes**: | ||
| - `/robots.txt` route handler in `src/runtime/server/routes/robots-txt.ts` | ||
| - Debug routes under `/__robots__/` for development | ||
| - **Server Composables**: `getSiteRobotConfig()` and `getPathRobotConfig()` for runtime robot configuration | ||
| - **Client Composables**: `useRobotsRule()` for accessing robot rules in Vue components | ||
| - **Meta Plugin**: Automatically injects robot meta tags and X-Robots-Tag headers | ||
|
|
||
| ### Build System | ||
|
|
||
| - Uses `@nuxt/module-builder` with unbuild configuration in `build.config.ts` | ||
| - Exports multiple entry points: main module, `/util`, and `/content` | ||
| - Supports both ESM and CommonJS via rollup configuration | ||
|
|
||
| ### Test Structure | ||
|
|
||
| - **Integration Tests**: Test fixtures in `test/fixtures/` with full Nuxt apps | ||
| - **Unit Tests**: Focused tests in `test/unit/` for specific functionality | ||
| - Uses `@nuxt/test-utils` for testing Nuxt applications | ||
| - Test environment automatically set to production mode | ||
|
|
||
| ### Development Workflow | ||
|
|
||
| The module supports a playground at `.playground` for local development and manual testing. The client UI (devtools integration) is developed separately in the `client/` directory. | ||
|
|
||
| ### I18n Integration | ||
|
|
||
| The module has special handling for i18n scenarios, with logic in `src/i18n.ts` for splitting paths and handling localized routes. | ||
|
|
||
| ### Content Integration | ||
|
|
||
| Provides integration with Nuxt Content module via `src/content.ts` for content-based robot configurations. |
File renamed without changes.
4 changes: 1 addition & 3 deletions
docs/content/3.api/1.nuxt-hooks.md β docs/content/3.api/robots-config.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
8 changes: 8 additions & 0 deletions
libs/is-bot/.gitignore
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| node_modules/ | ||
| dist/ | ||
| *.log | ||
| .DS_Store | ||
| coverage/ | ||
| .nyc_output/ | ||
| *.tgz | ||
| *.tar.gz |
162 changes: 162 additions & 0 deletions
libs/is-bot/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,162 @@ | ||
| # Bot Detection Library | ||
|
|
||
| A framework-agnostic bot detection library with advanced behavioral analysis capabilities. | ||
|
|
||
| ## Features | ||
|
|
||
| - π€ **Advanced Bot Detection**: Multi-layered analysis including user agents, behavioral patterns, and timing analysis | ||
| - π§ **Framework Agnostic**: Works with any web framework through driver pattern | ||
| - π **H3/Nuxt Ready**: Built-in support for H3 events and Nuxt applications | ||
| - π **Behavioral Analysis**: Modular system with simple, intermediate, and advanced detection behaviors | ||
| - πΎ **Flexible Storage**: Supports multiple storage backends through adapter pattern | ||
| - π― **High Performance**: Optimized with batch operations and intelligent caching | ||
| - π‘οΈ **Security Focused**: IP allowlists/blocklists, rate limiting, and threat detection | ||
|
|
||
| ## Installation | ||
|
|
||
| ```bash | ||
| npm install @nuxtjs/robots-bot-detection | ||
| ``` | ||
|
|
||
| ## Quick Start | ||
|
|
||
| ### Basic Usage | ||
|
|
||
| ```typescript | ||
| import { BotDetectionEngine, MemoryAdapter, H3SessionIdentifier } from '@nuxtjs/robots-bot-detection' | ||
|
|
||
| // Create storage adapter | ||
| const storage = new MemoryAdapter() | ||
|
|
||
| // Create session identifier | ||
| const sessionIdentifier = new H3SessionIdentifier() | ||
|
|
||
| // Create engine | ||
| const engine = new BotDetectionEngine({ | ||
| storage, | ||
| sessionIdentifier, | ||
| config: { | ||
| thresholds: { | ||
| likelyBot: 70, | ||
| definitelyBot: 90 | ||
| } | ||
| } | ||
| }) | ||
|
|
||
| // Analyze a request | ||
| const request = { | ||
| path: '/api/data', | ||
| method: 'GET', | ||
| headers: { | ||
| 'user-agent': 'Mozilla/5.0 ...' | ||
| }, | ||
| ip: '192.168.1.1', | ||
| timestamp: Date.now() | ||
| } | ||
|
|
||
| const result = await engine.analyze(request) | ||
| console.log(`Bot score: ${result.score}`) | ||
| console.log(`Is bot: ${result.isBot}`) | ||
| ``` | ||
|
|
||
| ### H3/Nuxt Integration | ||
|
|
||
| ```typescript | ||
| import { BotDetectionEngine, UnstorageBehaviorAdapter, H3SessionIdentifier } from '@nuxtjs/robots-bot-detection' | ||
| import { useStorage } from 'unstorage' | ||
|
|
||
| const storage = useStorage('redis://localhost:6379') | ||
| const adapter = new UnstorageBehaviorAdapter(storage) | ||
| const sessionIdentifier = new H3SessionIdentifier('your-session-secret') | ||
|
|
||
| const engine = new BotDetectionEngine({ | ||
| storage: adapter, | ||
| sessionIdentifier | ||
| }) | ||
|
|
||
| // In your H3 handler | ||
| export default defineEventHandler(async (event) => { | ||
| const result = await engine.analyze(request, event) | ||
|
|
||
| if (result.isBot) { | ||
| throw createError({ | ||
| statusCode: 429, | ||
| statusMessage: 'Too Many Requests' | ||
| }) | ||
| } | ||
|
|
||
| // Continue with normal processing | ||
| }) | ||
| ``` | ||
|
|
||
| ## API Reference | ||
|
|
||
| ### BotDetectionEngine | ||
|
|
||
| The main engine class for bot detection. | ||
|
|
||
| #### Constructor Options | ||
|
|
||
| ```typescript | ||
| interface BotDetectionEngineOptions { | ||
| storage: BehaviorStorage | ||
| sessionIdentifier: SessionIdentifier | ||
| responseStatusProvider?: ResponseStatusProvider | ||
| config?: BotDetectionConfig | ||
| } | ||
| ``` | ||
|
|
||
| #### Methods | ||
|
|
||
| - `analyze(request: BotDetectionRequest, event?: H3Event): Promise<BotDetectionResponse>` | ||
| - `updateConfig(config: Partial<BotDetectionConfig>): void` | ||
| - `cleanup(): Promise<void>` | ||
|
|
||
| ### Storage Adapters | ||
|
|
||
| #### MemoryAdapter | ||
| In-memory storage for development and testing. | ||
|
|
||
| #### UnstorageBehaviorAdapter | ||
| Production-ready storage adapter using unstorage. | ||
|
|
||
| ### Behavior Configuration | ||
|
|
||
| Configure which detection behaviors to enable: | ||
|
|
||
| ```typescript | ||
| const config = { | ||
| behaviors: { | ||
| simple: { | ||
| pathAnalysis: { enabled: true, weight: 1.0 }, | ||
| basicTiming: { enabled: true, weight: 0.8 }, | ||
| basicRateLimit: { enabled: true, weight: 1.2 } | ||
| }, | ||
| intermediate: { | ||
| burstDetection: { enabled: true, weight: 1.0 }, | ||
| headerConsistency: { enabled: true, weight: 0.9 } | ||
| }, | ||
| advanced: { | ||
| advancedTiming: { enabled: false, weight: 1.5 }, | ||
| browserFingerprint: { enabled: false, weight: 1.3 } | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ## Testing | ||
|
|
||
| ```bash | ||
| # Run tests | ||
| npm test | ||
|
|
||
| # Run tests with coverage | ||
| npm run test:coverage | ||
|
|
||
| # Run tests in watch mode | ||
| npm run dev | ||
| ``` | ||
|
|
||
| ## License | ||
|
|
||
| MIT License - see LICENSE file for details. |
70 changes: 70 additions & 0 deletions
libs/is-bot/package.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,70 @@ | ||
| { | ||
| "name": "@nuxtjs/robots-bot-detection", | ||
| "version": "1.0.0", | ||
| "description": "Framework-agnostic bot detection library", | ||
| "type": "module", | ||
| "main": "./dist/index.js", | ||
| "module": "./dist/index.js", | ||
| "types": "./dist/index.d.ts", | ||
| "exports": { | ||
| ".": { | ||
| "types": "./dist/index.d.ts", | ||
| "import": "./dist/index.js", | ||
| "require": "./dist/index.cjs" | ||
| }, | ||
| "./h3": { | ||
| "types": "./dist/drivers/h3.d.ts", | ||
| "import": "./dist/drivers/h3.js", | ||
| "require": "./dist/drivers/h3.cjs" | ||
| }, | ||
| "./behaviors": { | ||
| "types": "./dist/behaviors/index.d.ts", | ||
| "import": "./dist/behaviors/index.js", | ||
| "require": "./dist/behaviors/index.cjs" | ||
| } | ||
| }, | ||
| "files": [ | ||
| "dist", | ||
| "src" | ||
| ], | ||
| "scripts": { | ||
| "build": "tsup", | ||
| "dev": "tsup --watch", | ||
| "test": "vitest", | ||
| "test:run": "vitest run", | ||
| "test:coverage": "vitest run --coverage", | ||
| "typecheck": "tsc --noEmit", | ||
| "lint": "eslint src test --ext .ts,.js", | ||
| "lint:fix": "eslint src test --ext .ts,.js --fix" | ||
| }, | ||
| "keywords": [ | ||
| "bot-detection", | ||
| "security", | ||
| "web-scraping", | ||
| "rate-limiting", | ||
| "h3", | ||
| "nuxt", | ||
| "nitro" | ||
| ], | ||
| "author": "Nuxt Team", | ||
| "license": "MIT", | ||
| "dependencies": { | ||
| "unstorage": "^1.16.0" | ||
| }, | ||
| "peerDependencies": { | ||
| "h3": "^1.0.0" | ||
| }, | ||
| "devDependencies": { | ||
| "@types/node": "^20.19.4", | ||
| "eslint": "^9.30.1", | ||
| "h3": "^1.15.3", | ||
| "tsup": "^8.5.0", | ||
| "typescript": "^5.8.3", | ||
| "vitest": "^3.2.4" | ||
| }, | ||
| "repository": { | ||
| "type": "git", | ||
| "url": "https://github.com/nuxt-modules/robots.git", | ||
| "directory": "libs/is-bot" | ||
| } | ||
| } |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.