-
Notifications
You must be signed in to change notification settings - Fork 3.2k
[BUG] NER produces 0 links for gbrain-base-v2 — missing inference.regex hard check #2162
Description
Problem
The gbrain extract all --ner --source db command returns 0 results for gbrain-base-v2 because the pack's link_types lack inference.regex fields.
In src/core/extract-ner.ts (line ~110), the NER path has a hard check:
const hasRegex = pack.manifest.link_types.some( (lt) => lt.inference && typeof lt.inference === 'object' && 'regex' in lt.inference, ); if (!hasRegex) return { pages: 0, created: 0, pack_unavailable: true };
This means NER is completely disabled for gbrain-base-v2 users, because the bundled pack doesn't include any inference blocks.
Context
- Brain: 223 pages, 28 entity pages (person/company)
- Current link coverage: 14% (35 links from frontmatter wikilinks only)
- Target: 70%
- Timeline: Fixed (0 → 147 after format correction)
- Embedding: ✅ 100%
- Overall health: 65/100 → 50/100 brain score
The existing inferLinkType function in link-extraction.ts already has production-quality regex matchers for founded, invested_in, advises, works_at, and mentions — but NER bypasses them entirely when the pack lacks inference.regex.
Suggested Fix
Option A (preferred): Let NER fall through to the legacy inferLinkType when no pack regex is found, instead of returning 0. Update src/core/extract-ner.ts to call the legacy matcher as a fallback.
Option B: Add inference.regex blocks to gbrain-base-v2.yaml for the 4 typed verbs (works_at, founded, invested_in, advises) that already have production matchers in link-extraction.ts.
Option C: Allow home-config packs (~/.gbrain/schema-packs/) to be discovered and activated, enabling users to add their own regex patterns.
Impact
This single check blocks the entire NER pipeline for all gbrain-base-v2 users, preventing automated entity link extraction from the markdown body content.