crawlio-js is a Node.js SDK for interacting with the Crawlio web scraping and crawling API. It provides programmatic access to scraping, crawling, and batch processing endpoints with built-in error handling.
npm install crawlio.js
import { Crawlio } from 'crawlio.js' const client = new Crawlio({ apiKey: 'your-api-key' }) const result = await client.scrape({ url: 'https://example.com' }) console.log(result.html)
Creates a new Crawlio client.
Options:
| Name | Type | Required | Description |
|---|---|---|---|
| apiKey | string |
β | Your Crawlio API key |
| baseUrl | string |
β | API base URL (default: https://crawlio.xyz) |
Scrapes a single page.
await client.scrape({ url: 'https://example.com' })
ScrapeOptions:
| Name | Type | Required | Description |
|---|---|---|---|
| url | string |
β | Target URL |
| exclude | string[] |
β | CSS selectors to exclude |
| includeOnly | string[] |
β | CSS selectors to include |
| markdown | boolean |
β | Convert HTML to Markdown |
| returnUrls | boolean |
β | Return all discovered URLs |
| workflow | Workflow[] |
β | Custom workflow steps to execute |
| normalizeBase64 | boolean |
β | Normalize base64 content |
| cookies | CookiesInfo[] |
β | Cookies to include in the request |
| userAgent | string |
β | Custom User-Agent header for the request |
Initiates a site-wide crawl.
CrawlOptions:
| Name | Type | Required | Description |
|---|---|---|---|
| url | string |
β | Root URL to crawl |
| count | number |
β | Number of pages to crawl |
| sameSite | boolean |
β | Limit crawl to same domain |
| patterns | string[] |
β | URL patterns to match |
| exclude | string[] |
β | CSS selectors to exclude |
| includeOnly | string[] |
β | CSS selectors to include |
Checks the status of a crawl job.
Gets results from a completed crawl.
Performs a search on scraped content.
SearchOptions:
| Name | Type | Description |
|---|---|---|
| site | string |
Limit search to a specific domain |
Initiates scraping for multiple URLs in one request.
BatchScrapeOptions:
| Name | Type | Description |
|---|---|---|
| url | string[] |
List of URLs |
| options | Omit<ScrapeOptions, 'url'> |
Common options for all URLs |
Checks the status of a batch scrape job.
Fetches results from a completed batch scrape.
All Crawlio errors extend from CrawlioError. You can catch and inspect these for more context.
CrawlioErrorCrawlioRateLimitCrawlioLimitExceededCrawlioAuthenticationErrorCrawlioInternalServerErrorCrawlioFailureError
{ jobId: string html: string markdown: string meta: Record<string, string> urls?: string[] url: string }
{ id: string status: 'IN_QUEUE' | 'RUNNING' | 'LIMIT_EXCEEDED' | 'ERROR' | 'SUCCESS' error: number success: number total: number }
{ name: string value: string path: string expires?: number httpOnly: boolean secure: boolean domain: string sameSite: 'Strict' | 'Lax' | 'None' }