Turn any company website into structured B2B data (one API call)

DEV Community

Build your own scraper. Brittle, and every site is different. You'll spend more time maintaining selectors than using the data.
Pay a heavyweight data provider. Expensive, and the data is often a stale snapshot from months ago.
Paste HTML into an LLM and pray. Sometimes you get valid JSON. Sometimes you get a hallucinated CEO email that doesn't exist.

I kept hitting this wall while working with lists of company domains, so I built a small API that does one thing well: send a company URL, get back clean JSON.

The two rules that shaped it

1. It reads the live site at request time. Not a database snapshot from last quarter. If a company rebranded yesterday, you get today's version.

2. It never guesses. This was the hardest constraint to enforce with an LLM in the pipeline. Missing fields come back as null — never invented. If there's no contact email on the site, you get "email": null, not a plausible-looking fake you'd import straight into your CRM.

What a call looks like

curl --request GET \
 --url 'https://ai-live-company-enrichment-tech-detector.p.rapidapi.com/v1/enrich?url=https%3A%2F%2Fstripe.com' \
 --header 'x-rapidapi-host: ai-live-company-enrichment-tech-detector.p.rapidapi.com' \
 --header 'x-rapidapi-key: YOUR_KEY'

And the response:

{"url":"https://stripe.com","cached":false,"data":{"company_name":"Stripe, Inc.","sector":"Financial Technology / Payments","description":"Stripe is a financial infrastructure platform for businesses...","social_links":{"linkedin":"https://www.linkedin.com/company/stripe","twitter":"https://twitter.com/stripe","github":"https://github.com/stripe"},"contact_email":null,"tech_stack":["React","Next.js","Cloudflare","..."]}}

How it works under the hood

A few design decisions, for the curious:

Two-pass tech detection. A fast pattern-matching pass first (think Wappalyzer-style fingerprints), then an LLM enrichment pass only for what patterns can't catch. Cheaper and faster than going full-LLM on everything.
Hard content trimming before the LLM. Page content is capped before any model call. This keeps latency and cost predictable instead of exploding on heavy JS-rendered sites.
Caching with a 14-day TTL. Repeat lookups on the same domain return in ~200 ms instead of re-scraping. The cached field in the response tells you which path you hit.
Strict schema validation. Every response is validated against a strict schema (Pydantic v2) before it leaves the API. Either the JSON conforms, or you get a proper error — never half-broken output.

Use cases I built it for

Lead enrichment: turn a list of prospect domains into CRM-ready records.
Tech-based targeting: filter prospects by their stack ("show me companies running Shopify").
Data hygiene: verify and refresh company records against the live web instead of stale databases.

Try it

There's a free tier (100 requests/month), enough to test it against your own data:

👉 AI Live Company Enrichment & Tech Detector on RapidAPI

I'd genuinely love feedback from other builders — on the positioning, the pricing, and especially: what field would you want it to extract next? Drop a comment below.