-
Notifications
You must be signed in to change notification settings - Fork 3.6k
fix(web-search): match Brave's current result-title markup#688
fix(web-search): match Brave's current result-title markup #688ly-wang19 wants to merge 1 commit into
Conversation
Brave moved the web-result title from `<span class="search-snippet-title">` to `<div class="... search-snippet-title ...">`, so parseBraveSearchHtml hit `if (!title) continue` for every snippet and returned 0 results against the live page. The existing test stayed green because its fixture still used the old <span> markup (drifted from reality). Accept either <span> or <div> for the title, and update the fixtures to the current markup (keeping one legacy <span> case). Verified end-to-end against a real search.brave.com scrape: 0 results before, real results after. Closes THU-MAIC#687
tongshu2023
commented
Jun 10, 2026
+1 — independently hit the same bug and can confirm this fix against the live page (before finding this PR; my duplicate #719 is closed in favor of this one).
Live data from a scrape of https://search.brave.com/search?q=photosynthesis just now (2026年06月10日), using the exact BRAVE_HEADERS from brave.ts: HTTP 200, 20 data-type=web snippets, 0 <span class=...search-snippet-title...> matches, 20 <div class=...search-snippet-title...> matches. So the span-only regex extracts exactly 0 of 20, and this PR's pattern matches all 20.
One optional nit: <\/(?:span|div)> would also accept a mismatched pair like <span ...>...</div>. A backreference makes the close tag track the open tag: /<(span|div)[^>]*class=[^]*search-snippet-title[^]*[^>]*>([\s\S]*?)<\/1円>/i (title moves to capture group 2). Harmless either way given stripHtml, so feel free to ignore.
What & why
The keyless Brave scrape (
parseBraveSearchHtml,lib/web-search/brave.ts) returns 0 results against Brave’s current HTML. Brave moved the result title element:<span class="search-snippet-title">Title</span><div class="title search-snippet-title line-clamp-1 ..." title="Title">Title</div>The title regex matched
<span>only, soif (!title) continue;skipped every snippet → empty results. The existing test stayed green because its fixture still used the old<span>markup — it had drifted from the live page and gave false confidence.Closes #687.
Verified end-to-end (live scrape)
A real
search.brave.comfetch (with the app’s exactBRAVE_HEADERS) returns HTTP 200 with 20data-type="web"snippet blocks; the current parser extracts 0. With the fix, the same scrape returns real results, e.g.Photosynthesis - Wikipedia -> https://en.wikipedia.org/wiki/Photosynthesiswith content.Fix
Accept
<span>or<div>for the title element, and updatetests/web-search/brave.test.tsfixtures to the current markup (keeping one legacy<span>case for back-compat).Test plan
npx vitest run tests/web-search/brave.test.ts(2 pass),tsc/prettier/eslintclean.This un-blocks #642 (keyless Brave as a server default), whose review asked to confirm keyless returns results end-to-end.