Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix(web-search): match Brave's current result-title markup#688

Open
ly-wang19 wants to merge 1 commit into
THU-MAIC:main from
ly-wang19:fix/brave-scraper-title-markup
Open

fix(web-search): match Brave's current result-title markup #688
ly-wang19 wants to merge 1 commit into
THU-MAIC:main from
ly-wang19:fix/brave-scraper-title-markup

Conversation

@ly-wang19

@ly-wang19 ly-wang19 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

What & why

The keyless Brave scrape (parseBraveSearchHtml, lib/web-search/brave.ts) returns 0 results against Brave’s current HTML. Brave moved the result title element:

  • old: <span class="search-snippet-title">Title</span>
  • now: <div class="title search-snippet-title line-clamp-1 ..." title="Title">Title</div>

The title regex matched <span> only, so if (!title) continue; skipped every snippet → empty results. The existing test stayed green because its fixture still used the old <span> markup — it had drifted from the live page and gave false confidence.

Closes #687.

Verified end-to-end (live scrape)

A real search.brave.com fetch (with the app’s exact BRAVE_HEADERS) returns HTTP 200 with 20 data-type="web" snippet blocks; the current parser extracts 0. With the fix, the same scrape returns real results, e.g. Photosynthesis - Wikipedia -> https://en.wikipedia.org/wiki/Photosynthesis with content.

Heads-up: keyless scraping is best-effort — rapid repeat requests get rate-limited/challenged by Brave (some return an empty page). That’s a separate docs/UX caveat, not addressed here.

Fix

Accept <span> or <div> for the title element, and update tests/web-search/brave.test.ts fixtures to the current markup (keeping one legacy <span> case for back-compat).

Test plan

npx vitest run tests/web-search/brave.test.ts (2 pass), tsc/prettier/eslint clean.

This un-blocks #642 (keyless Brave as a server default), whose review asked to confirm keyless returns results end-to-end.

Brave moved the web-result title from `<span class="search-snippet-title">`
to `<div class="... search-snippet-title ...">`, so parseBraveSearchHtml hit
`if (!title) continue` for every snippet and returned 0 results against the
live page. The existing test stayed green because its fixture still used the
old <span> markup (drifted from reality).
Accept either <span> or <div> for the title, and update the fixtures to the
current markup (keeping one legacy <span> case). Verified end-to-end against a
real search.brave.com scrape: 0 results before, real results after.
Closes THU-MAIC#687 

Copy link
Copy Markdown
Contributor

+1 — independently hit the same bug and can confirm this fix against the live page (before finding this PR; my duplicate #719 is closed in favor of this one).

Live data from a scrape of https://search.brave.com/search?q=photosynthesis just now (2026年06月10日), using the exact BRAVE_HEADERS from brave.ts: HTTP 200, 20 data-type=web snippets, 0 <span class=...search-snippet-title...> matches, 20 <div class=...search-snippet-title...> matches. So the span-only regex extracts exactly 0 of 20, and this PR's pattern matches all 20.

One optional nit: <\/(?:span|div)> would also accept a mismatched pair like <span ...>...</div>. A backreference makes the close tag track the open tag: /<(span|div)[^>]*class=[^]*search-snippet-title[^]*[^>]*>([\s\S]*?)<\/1円>/i (title moves to capture group 2). Harmless either way given stripHtml, so feel free to ignore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Brave keyless scraper returns 0 results — result-title markup changed (span→div)

AltStyle によって変換されたページ (->オリジナル) /