Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[BUG] NER produces 0 links for gbrain-base-v2 — missing inference.regex hard check #2162

Open

Description

Problem

The gbrain extract all --ner --source db command returns 0 results for gbrain-base-v2 because the pack's link_types lack inference.regex fields.

In src/core/extract-ner.ts (line ~110), the NER path has a hard check:

const hasRegex = pack.manifest.link_types.some(
 (lt) => lt.inference && typeof lt.inference === 'object' && 'regex' in lt.inference,
);
if (!hasRegex) return { pages: 0, created: 0, pack_unavailable: true };

This means NER is completely disabled for gbrain-base-v2 users, because the bundled pack doesn't include any inference blocks.

Context

  • Brain: 223 pages, 28 entity pages (person/company)
  • Current link coverage: 14% (35 links from frontmatter wikilinks only)
  • Target: 70%
  • Timeline: Fixed (0 → 147 after format correction)
  • Embedding: ✅ 100%
  • Overall health: 65/100 → 50/100 brain score

The existing inferLinkType function in link-extraction.ts already has production-quality regex matchers for founded, invested_in, advises, works_at, and mentions — but NER bypasses them entirely when the pack lacks inference.regex.

Suggested Fix

Option A (preferred): Let NER fall through to the legacy inferLinkType when no pack regex is found, instead of returning 0. Update src/core/extract-ner.ts to call the legacy matcher as a fallback.

Option B: Add inference.regex blocks to gbrain-base-v2.yaml for the 4 typed verbs (works_at, founded, invested_in, advises) that already have production matchers in link-extraction.ts.

Option C: Allow home-config packs (~/.gbrain/schema-packs/) to be discovered and activated, enabling users to add their own regex patterns.

Impact

This single check blocks the entire NER pipeline for all gbrain-base-v2 users, preventing automated entity link extraction from the markdown body content.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /