Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Weekend-Dev-Labs/crawlio-js

Repository files navigation

πŸ•·οΈ Crawlio JS SDK

crawlio-js is a Node.js SDK for interacting with the Crawlio web scraping and crawling API. It provides programmatic access to scraping, crawling, and batch processing endpoints with built-in error handling.

Visit Crawlio See Docs


πŸ“¦ Installation

npm install crawlio.js

πŸš€ Getting Started

import { Crawlio } from 'crawlio.js'
const client = new Crawlio({ apiKey: 'your-api-key' })
const result = await client.scrape({ url: 'https://example.com' })
console.log(result.html)

πŸ”§ Constructor

new Crawlio(options: CrawlioOptions)

Creates a new Crawlio client.

Options:

Name Type Required Description
apiKey string βœ… Your Crawlio API key
baseUrl string ❌ API base URL (default: https://crawlio.xyz)

πŸ“˜ API Methods

scrape(options: ScrapeOptions): Promise<ScrapeResponse>

Scrapes a single page.

await client.scrape({ url: 'https://example.com' })

ScrapeOptions:

Name Type Required Description
url string βœ… Target URL
exclude string[] βœ… CSS selectors to exclude
includeOnly string[] ❌ CSS selectors to include
markdown boolean ❌ Convert HTML to Markdown
returnUrls boolean ❌ Return all discovered URLs
workflow Workflow[] ❌ Custom workflow steps to execute
normalizeBase64 boolean ❌ Normalize base64 content
cookies CookiesInfo[] ❌ Cookies to include in the request
userAgent string ❌ Custom User-Agent header for the request

crawl(options: CrawlOptions): Promise<CrawlResponse>

Initiates a site-wide crawl.

CrawlOptions:

Name Type Required Description
url string βœ… Root URL to crawl
count number βœ… Number of pages to crawl
sameSite boolean ❌ Limit crawl to same domain
patterns string[] ❌ URL patterns to match
exclude string[] ❌ CSS selectors to exclude
includeOnly string[] ❌ CSS selectors to include

crawlStatus(id: string): Promise<CrawlStatusResponse>

Checks the status of a crawl job.


crawlResults(id: string): Promise<{ results: ScrapeResponse[] }>

Gets results from a completed crawl.


search(query: string, options?: SearchOptions): Promise<SearchResponse>

Performs a search on scraped content.

SearchOptions:

Name Type Description
site string Limit search to a specific domain

batchScrape(options: BatchScrapeOptions): Promise<BatchScrapeResponse>

Initiates scraping for multiple URLs in one request.

BatchScrapeOptions:

Name Type Description
url string[] List of URLs
options Omit<ScrapeOptions, 'url'> Common options for all URLs

batchScrapeStatus(batchId: string): Promise<BatchScrapeStatusResponse>

Checks the status of a batch scrape job.


batchScrapeResult(batchId: string): Promise<{ results: { id: string; result: ScrapeResponse } }>

Fetches results from a completed batch scrape.


πŸ›‘ Error Handling

All Crawlio errors extend from CrawlioError. You can catch and inspect these for more context.

Error Types:

  • CrawlioError
  • CrawlioRateLimit
  • CrawlioLimitExceeded
  • CrawlioAuthenticationError
  • CrawlioInternalServerError
  • CrawlioFailureError

πŸ“„ Types

ScrapeResponse

{
 jobId: string
 html: string
 markdown: string
 meta: Record<string, string>
 urls?: string[]
 url: string
}

CrawlStatusResponse

{
 id: string
 status: 'IN_QUEUE' | 'RUNNING' | 'LIMIT_EXCEEDED' | 'ERROR' | 'SUCCESS'
 error: number
 success: number
 total: number
}

CookiesInfo

{
 name: string
 value: string
 path: string
 expires?: number
 httpOnly: boolean
 secure: boolean
 domain: string
 sameSite: 'Strict' | 'Lax' | 'None'
}

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /