Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat(search): index prose content for BM25 full-text search#617

Open
ShauryaaSharma wants to merge 1 commit into
DeusData:main from
ShauryaaSharma:feat/fts-prose-content
Open

feat(search): index prose content for BM25 full-text search #617
ShauryaaSharma wants to merge 1 commit into
DeusData:main from
ShauryaaSharma:feat/fts-prose-content

Conversation

@ShauryaaSharma

@ShauryaaSharma ShauryaaSharma commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

What & why

search_graph BM25 only matched node names and headings, so it was blind to the
prose that documentation- and config-heavy repos carry. Markdown Section nodes
exposed only their heading; YAML/JSON Module nodes only their file name — the
section body and the description value were never indexed, and Section/Module
were excluded from BM25 results entirely. This indexes that prose so content is
searchable.

Closes #518
Closes #519

Changes

Testing

7 extraction cases + 3 store FTS cases added. Verified end-to-end: bodies are
extracted → indexed into nodes_fts.body → returned by BM25; json_valid() tolerates
malformed rows; legacy FTS tables upgrade on rebuild.

Notes

Backward compatible (additive column; legacy DBs upgrade on next index). No MCP
tool changes, no new deps, no new system()/popen()/network calls. #518 and #519
share the FTS body infra (#519 can't work without it), so they're together —
happy to split if preferred.

Section nodes (markdown) and Module nodes (YAML/JSON) previously exposed
only their heading/name to BM25, so search_graph could not match the prose
body or a config description. Index that text so content is searchable.
- store: add a `body` column to the nodes_fts FTS5 table; new
 cbm_store_fts_rebuild() drops+recreates the table (upgrading legacy
 4-column databases) and backfills `body` from each node's docstring,
 guarded by json_valid() against malformed-JSON rows
- pipeline: both FTS backfill sites now call cbm_store_fts_rebuild()
- mcp: stop excluding Section/Module from BM25 results (they rank below
 code symbols, so existing result ordering is preserved)
- internal/cbm: capture the markdown section body beneath each heading
 (DeusData#518) and promote top-level description/summary/purpose values onto
 the file's Module node (DeusData#519), reusing the existing docstring property
- tests: 7 extraction cases + 3 store FTS cases
Closes DeusData#518
Closes DeusData#519
Signed-off-by: ShauryaaSharma <shauryasofficial27@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

META.yaml/frontmatter description values not indexed for BM25 search Section nodes don't index body text — BM25 can't search markdown content

1 participant

AltStyle によって変換されたページ (->オリジナル) /