DiscovAI/DiscovAI-crawl

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.vscode		.vscode
apps		apps
.gitignore		.gitignore
.npmrc		.npmrc
LICENSE		LICENSE
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
turbo.json		turbo.json

Repository files navigation

DiscovAI Crawl API 🕷️🔍

One API to scrape everything you need from URLs for your AI tool and vector database.

🚧 Work in Progress 🚧

🌟 Features

Our API provides a comprehensive suite of data extraction and processing capabilities:

🧼 Clean HTML (JavaScript and CSS removed)
📝 LLM-friendly Markdown conversion
🚫 Ad-free, cookie banner-free, and dialog-free content
📸 Website screenshots (auto-saved to AWS S3 or Cloudflare R2)
🤖 LLM-generated SEO-friendly content
🔑 LLM-extracted key information (summary, features, FAQs, etc.)
🧠 Ready-to-use embeddings for vector database integration (auto-saved to db)

🔧 Installation

pnpm i
cd apps/api && pnpm exec playwright install

🚀 Usage

pnpm dev
open http://localhost:3000

📦 API Response Structure

{
 "clean_html": "...",
 "LLM_friendly_markdown": "...",
 "clean_text": "...",
 "screenshot_url": "...",
 "llm_extracts_key_info": {
 "what": "...",
 "summary": "...",
 "features": ["...", "..."],
 "faqs": [{"q": "...", "a": "..."}]
 },
 "llm_summarized_detail": "...",
 "embeddings": [...]
}

📚 Documentation

TODO

🤝 Contributing

TODO

About

🕷️ DiscovAI Crawl API(🚧 Work in Progress 🚧): A powerful web scraping solution for AI tools and vector databases. Extract clean HTML, generate LLM-friendly content, and create embeddings from any URL.

Releases

No releases published

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DiscovAI/DiscovAI-crawl

Folders and files

Latest commit

History

Repository files navigation

DiscovAI Crawl API 🕷️🔍

🌟 Features

🔧 Installation

🚀 Usage

📦 API Response Structure

📚 Documentation

🤝 Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DiscovAI Crawl API 🕷️🔍

🌟 Features

🔧 Installation

🚀 Usage

📦 API Response Structure

📚 Documentation

🤝 Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages