No more bookmarking docs you'll forget about. No more 47 browser tabs. No more "I read this somewhere but can't find it."
npm i -g @memvid/maw
Or just npx @memvid/maw if you don't want to install anything.
maw https://react.dev # crawls entire site → maw.mv2 maw find maw.mv2 "useEffect" # instant search maw ask maw.mv2 "when should I use useCallback vs useMemo?" # AI answers
That last one needs OPENAI_API_KEY in your env. The first two work out of the box.
Basically anything.
maw https://docs.python.org # documentation (2,847 pages, ~4 min) maw https://paulgraham.com # blogs maw . # your local codebase maw https://github.com/user/repo # any git repo maw "https://news.ycombinator.com/item?id=12345" # single pages
It figures out the right approach automatically:
- Single page URL → fetches just that page
- Domain root → crawls everything it can find
- Local path → reads your files directly
- Cloudflare/bot protection → switches to stealth browser
Crawl
maw <url> # saves to maw.mv2 maw <url> -o docs.mv2 # custom output file maw <url> docs.mv2 # same (appends if file exists) maw <url> --depth 5 --max-pages 500 # go deeper
Search
maw find docs.mv2 "authentication" # keyword search maw ask docs.mv2 "how do I do X?" # AI-powered answers maw list docs.mv2 # see what's in there
Preview before crawling
maw preview stripe.com # shows sitemap, page count estimateExport
maw export docs.mv2 -f markdown # dump everything to markdown maw export docs.mv2 -f json # or json
By default you get keyword search (BM25). It's fast and works well for most things.
Want semantic search? Add --embed:
maw https://kubernetes.io/docs --embed openai
Costs about 0ドル.01 per 1000 pages. Your queries will understand meaning, not just keywords.
fetch (fast) → playwright (real browser) → rebrowser (stealth)
90% of sites work with a simple fetch. The other 10% get a real browser. If that's blocked too, stealth mode usually gets through.
You don't have to think about this. It just tries each approach until something works.
| Flag | What it does | Default |
|---|---|---|
-o, --output <file> |
Output file | maw.mv2 |
-d, --depth <n> |
How deep to crawl | 2 |
-m, --max-pages <n> |
Stop after this many pages | 150 |
-c, --concurrency <n> |
Parallel requests | 10 |
-r, --rate-limit <n> |
Max requests/second | 10 |
--include <regex> |
Only crawl URLs matching this | - |
--exclude <regex> |
Skip URLs matching this | - |
--browser |
Force browser mode | - |
--stealth |
Force stealth mode | - |
--embed [model] |
Enable semantic embeddings | - |
--no-robots |
Ignore robots.txt | - |
--no-sitemap |
Don't use sitemap.xml | - |
# grab some docs maw https://react.dev maw https://docs.python.org maw https://stripe.com/docs # archive a blog maw https://paulgraham.com/articles.html # your own code maw . maw https://github.com/your/repo # combine sources into one file maw https://react.dev https://nextjs.org -o frontend.mv2 # big crawl with semantic search maw https://kubernetes.io/docs --depth 4 --max-pages 1000 --embed openai
Up to 50MB works without an API key. That's roughly 500-2000 pages depending on how much text is on each page.
Need more? Get a key at memvid.com.
Is this legal?
Respects robots.txt by default. What you do with --no-robots is your business.
What's an .mv2 file?
A memvid file. Single-file database with search built in. Like SQLite but for documents and memory.
Programmatic usage?
import { maw, find, ask } from '@memvid/maw' await maw(['https://example.com'], { output: 'site.mv2' }) const results = await find('site.mv2', 'search term') const answer = await ask('site.mv2', 'explain this to me')
Will I get rate limited?
Default is 10 req/sec with backoff. Most sites won't notice. If you're worried, use --rate-limit 2.
JS-rendered content?
Works. Falls back to a real browser automatically when needed.
MIT License · Built on memvid