-
Notifications
You must be signed in to change notification settings - Fork 1.1k
task: Parsing & extraction quality — language/format coverage gaps #592
Open
Description
Scope
Umbrella tracker for graph extraction quality — language/format coverage gaps, missing node/edge kinds, and extraction false positives. These are not crashes (see the stability/performance umbrellas) but cases where indexing succeeds yet the resulting graph is shallow, mistyped, or wrong.
Sub-issues
Language coverage / LSP depth
- Java: @Annotation, signatures, and all AST properties missing from graph nodes #382 — Java:
@Annotation, signatures, and AST properties missing from graph nodes - Rust LSP #405 — Rust LSP (hybrid-LSP tier resolution)
- Feature Request: Add Hybrid LSP support for Julia #535 — Hybrid LSP support for Julia
- Enhancement: index inner declarations of factory/setup callbacks — Vue Pinia setup stores, composables & React hooks (the Vue/React twin of the tracked Zustand gap) #415 — Index inner declarations of factory/setup callbacks (Vue Pinia setup stores, composables, React hooks)
Format / IaC / config extraction
- feat: GitHub Actions semantic extraction — workflows, jobs, steps, uses/needs edges #450 — GitHub Actions semantic extraction (workflows, jobs, steps, uses/needs edges)
- feat(python): class-field nodes, type-annotation edges, and enum members #451 — Python class-field nodes, type-annotation edges, enum members
- feat: Terraform module composition edges + block-type labels (Resource/DataSource/Module/Variable/Output) #452 — Terraform module composition edges + block-type labels
- feat(helm): nested .Values linkage, template→kind typing, image/env + hook/RBAC edges #454 — Helm nested
.Valueslinkage, template→kind typing, image/env + hook/RBAC edges
Correctness / false positives
- cfg-gated twin functions collapse into one node; get_code_snippet returns the inactive branch's body #495 — cfg-gated twin functions collapse into one node;
get_code_snippetreturns inactive branch - Route nodes created from URL strings in config / non-source files #521 — Route nodes created from URL strings in config / non-source files
Content / docs indexing (BM25)
- Index documents as well #490 — Index documents as well
- Section nodes don't index body text — BM25 can't search markdown content #518 — Section nodes don't index body text — BM25 can't search markdown content
- META.yaml/frontmatter description values not indexed for BM25 search #519 — META.yaml / frontmatter description values not indexed for BM25
Tracked via open PR (not part of this task's open work)
- Cross-file CALLS edges resolve to Module node instead of caller Function node (v0.7.0) #438 / C++ out-of-line method definitions: CALLS edge source falls back to Module (file-level) instead of the enclosing Method #554 — cross-file & C++ out-of-line CALLS edges resolve to Module instead of the enclosing function — PR fix(extract): attribute C/C++ CALLS edges to the enclosing function #463
- Add Perl LSP-tier semantic resolution (perl_lsp.c) #459 — Perl LSP-tier semantic resolution — PR feat(perl-lsp): Perl LSP-tier semantic resolution (Closes #459) #461
- Feature request: ObjectScript (InterSystems IRIS) language support #462 — ObjectScript (InterSystems IRIS) language support — PR feat(objectscript): InterSystems IRIS ObjectScript language support #467 / chore(vendor): vendor the InterSystems ObjectScript tree-sitter grammars #590
- Add cross-repo Maven library dependency links #440 — cross-repo Maven library dependency links — PR feat(cross-repo): link Maven library dependencies #442
- SQL DDL is under-modeled: CREATE TABLE/VIEW become generic Variable nodes with no lineage #574 — SQL DDL Table/View nodes + FROM/JOIN lineage — PR feat(sql): first-class Table/View nodes and FROM/JOIN lineage edges #582
- Extract dbt lineage and macros from raw .sql models (no compiled manifest) #575 — dbt lineage/macros from raw
.sql— PR feat(dbt): extract dbt Jinja lineage and macros from raw .sql models #584 - Ingest dbt manifest.json into Model/Source nodes + DEPENDS_ON lineage #576 — dbt
manifest.jsoningest + DEPENDS_ON — PR feat(dbt): add ingest_dbt_manifest tool for Model/Source nodes and DEPENDS_ON lineage #583
Acceptance
Per item: the missing nodes/edges are emitted (or the false positive is suppressed), with a reproduce-first test on a public fixture, or the item is closed with rationale (out of scope / by design).
Why one task
These share the extraction pipeline (tree-sitter queries + hybrid-LSP resolvers + node/edge emission). Triaging them together keeps language/format coverage coherent instead of scattered one-off queries.