Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Releases: wolverin0/memorymaster

v3.19.0 — Phase 0 hardening (H1+H2+H3+H4)

17 May 19:30
@wolverin0 wolverin0
39ac4ff
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

Highlights

Phase 0 hardening release. Closes all four security/ops gaps from the GPT-5.4 review against docs/ROADMAP.md Phase 0. All four mechanisms ship opt-in by default — zero breaking changes for callers that don't set the new env vars.

  • H1 (#113) — per-cycle LLM budget caps with reason-coded hard stops + per-provider circuit breaker
  • H2 (#114) — dashboard HTTP auth (viewer/operator roles) + CSRF + bind-safety refusal
  • H3 (#115) — webhook HMAC-SHA-256 signing + timestamp + 5-min replay window
  • H4 (#116) — MCP db/workspace path allowlist + admin-mode bypass

Env vars reference

Env var Default Purpose
MEMORYMASTER_MAX_LLM_CALLS_PER_CYCLE 0 (unlimited) H1 cycle call cap
MEMORYMASTER_MAX_TOKENS_PER_CYCLE 0 (unlimited) H1 cycle token cap
MEMORYMASTER_MAX_PROVIDER_FAILURES_PER_CYCLE 0 (unlimited) H1 per-provider breaker
MEMORYMASTER_DASHBOARD_TOKEN_VIEWER unset (legacy) H2 read-only bearer
MEMORYMASTER_DASHBOARD_TOKEN_OPERATOR unset (legacy) H2 mutating bearer
MEMORYMASTER_DASHBOARD_UNSAFE_BIND unset (refuse) H2 non-loopback escape
MEMORYMASTER_WEBHOOK_SECRET unset (no sig) H3 HMAC signing key
MEMORYMASTER_MCP_DB_ALLOWLIST unset (allow all) H4 DB path allowlist
MEMORYMASTER_MCP_WORKSPACE_ALLOWLIST unset (allow all) H4 workspace allowlist
MEMORYMASTER_MCP_ADMIN_MODE unset (enforce) H4 allowlist bypass

Tests

63 new tests, zero regressions on pre-existing suites.

  • test_llm_budget.py — 8 tests
  • test_dashboard_auth.py — 25 tests (19 unit + 6 end-to-end HTTP)
  • test_webhook_hmac.py — 13 tests
  • test_mcp_path_policy.py — 17 tests (12 unit + 5 chokepoint integration)

What's next

  • v3.20.0 — Phase 1 storage discipline (versioned migrations + SQLite/Postgres parity gate)
  • A1 full LongMemEval-S QA-accuracy publication run — mechanism shipped in v3.18.0 (#109), now safer with H1 budget caps in place
Assets 2
Loading

v3.18.0 - LongMemEval-S R@5 0.972, claude_cli judge unlocked

17 May 18:01
@wolverin0 wolverin0
f9e7b82
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

Highlights

Retrieval R@5 lifted 0.966 → 0.972 (+0.006) with per-question-type weight profiles (PR #110). Single-session-preference bucket alone: 0.80 → 0.90 (+0.10), every other bucket unchanged. First non-NULL retrieval improvement since v3.15.0.

claude_cli judge provider (PR #109) unblocks the full LongMemEval-S QA-accuracy bench for OAuth-only environments — no API keys required.

ROADMAP added (docs/ROADMAP.md) — pivots away from benchmark grinding toward differentiator surfaces (governance UI, MCP observability, postgres parity).

What's in this release

Added

  • Per-question-type retrieval weight profiles: MEMORYMASTER_RETRIEVAL_PROFILE_<TYPE>=lex,conf,fresh,vec env-var family. Opt-in, isolated — no behavior change when unset.
  • claude_cli judge provider in tests/bench_longmemeval.py:JudgeClient routing through Claude Code OAuth.

Improved

  • Single-session-preference bucket R@5: 0.8000 → 0.9000 with the validated SINGLE_SESSION_PREFERENCE=0.10,0.10,0.10,0.70 profile.

Notes

  • docs/longmemeval-results.md per-bucket table is stale (will refresh on next publication pass)
  • A1 full 500q overnight publication run deferred

PRs

  • #109 - feat(bench): claude_cli judge provider (OAuth, no API key required)
  • #110 - feat(retrieval): per-question-type weight profiles (S3, +0.006 R@5)
  • #111 - chore(release): v3.18.0

Competitive position

LongMemEval-S R@5 = 0.972 leads the small published set (MemPalace 0.966, agentmemory 0.952).

Loading

v3.17.1 - steward auto-ingest hook for daydream

16 May 04:12
@wolverin0 wolverin0
552f9e4
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

Opt-in env-gated hook (MEMORYMASTER_DAYDREAM_INGEST_DIR=/Daydreams) so the existing 6h steward cron also auto-ingests daydream insights. Default OFF. Error-isolated — never breaks the cycle. 4 safety tests pass. See CHANGELOG.md.

Loading

v3.17.0 - daydream insights ingest pipeline

16 May 03:28
@wolverin0 wolverin0
718437f
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

Closes the loop: vault -> daydream -> MemoryMaster candidate claims -> steward validation -> wiki-absorb -> vault. New CLI: python -m memorymaster --db <db> ingest-daydream <vault>/Daydreams. See CHANGELOG.md and docs/daydream-integration.md.

Loading

v3.16.0 - S1 architectural unblock + S2 honest-null

14 May 23:51
@wolverin0 wolverin0
d22dd53
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

v3.16.0 ships the retrieval-weights architectural unblock (S1, KEEP) and a documented honest-null on RRF-as-tiebreaker (S2, NULL). R@5 unchanged at 0.966, still leading agentmemory. See CHANGELOG.md.

Loading

v3.15.1 - README benchmark chart + v3.16 roadmap

14 May 21:00
@wolverin0 wolverin0
161246a
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

Docs-only release on top of v3.15.0. Adds:

  • README Benchmarks section with inline SVG chart comparing v3.14, v3.15, and agentmemory
  • docs/benchmark-longmemeval.svg (chart asset)
  • docs/v316-roadmap.md with ranked next-step levers (S1-S3 high-leverage, A1-A3 worth running, B1-B3 deferred)

R@5 = 0.966 from v3.15.0 stands. Production retrieval code unchanged.

Loading

v3.15.0 - LongMemEval-S R@5 0.894 -> 0.966

14 May 20:43
@wolverin0 wolverin0
4e0c899
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

v3.15.0 ships the E01 bench-harness fix (PR #94) that lifted R@5 by +0.072 to 0.966. 5 follow-up experiments documented as honest-null/harm. MemoryMaster now leads agentmemory R@5 + MRR with retrieval-only. See CHANGELOG.md for full breakdown.

Loading

v3.14.0 — 29-PR auto-harness session

12 May 21:12
@wolverin0 wolverin0
5b8531d
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

v3.14.0 release notes

29 PRs merged this session (#61-#89):

D-hypothesis bug fixes (from PR #62 scenario explorer)

  • #65 compactor writes artifact before flipping status to archived
  • #68 dedup treats object_value mismatch as conflict, not duplicate
  • #69 compact_summaries redacts claim text via sensitivity filter before LLM egress

Features

  • #71 dashboard /healthz + /readyz operational endpoints
  • #73 sensitivity filter broadened (bearer, JWT, DB URIs, AWS keys, GitHub PATs)
  • #77 vault_linter detects orphan wiki articles
  • #83 observability prometheus-style counters
  • #84 CLI --dry-run flag for compact / dedup / decay
  • #85 snapshot round-trip parity tests
  • #87 mcp ingest_claim per-source-agent rate limit
  • #88 federated_query cross-tenant sensitivity safety

Tests

  • #70 dream-bridge sensitivity filter regression suite
  • #72 lifecycle supersede ↔ replaced_by symmetry invariant
  • #75 wiki-absorb idempotency
  • #78 mcp filter-bypass attempt suite (5 categories)
  • #79 concurrent supersession race safety
  • #80 db_merge deterministic conflict resolution by updated_at
  • #82 decay respects pinned=true

Docs / Audits

  • #61 STRIDE + OWASP audit (15 findings)
  • #62 lifecycle edge-case scenario explorer (24 scenarios across 12 dimensions)
  • #63 Windows + git hooks + codex sandbox troubleshooting playbook
  • #64 wiki frontmatter compliance audit
  • #66 ADR for wiki auto-promote on N validations
  • #67 cross-project federation contract
  • #74 architecture refresh (module map + data flow)
  • #76 SQLite ↔ Postgres schema parity audit
  • #81 MCP tool compliance audit (citation / source_agent / sensitivity)
  • #86 CLI cookbook (90 subcommands documented)

Quality

  • #89 ruff lint clean (46 → 11 errors; remaining 11 are intentional E402)

Generated via auto-harness pattern: 7 dispatch waves of 2-5 codex exec processes in parallel git worktrees, ~4 hours wall-clock orchestration time.

Loading

v3.13.0 — Atlas Inbox V1 + audit cleanup + wiki UX + Windows steward fix

10 May 22:06
@wolverin0 wolverin0

Choose a tag to compare

24 commits, 65 files, +9 862 lines since v3.12.0. Headline: Atlas Inbox V1 ships as a versioned backend contract for downstream consumers (LifeAgent, etc.). Plus the full F-audit cleanup batch, real provider adapters, wiki UX upgrades, and a critical Windows fix.

Atlas Inbox V1 — WhatsApp → claims/actions → Super Productivity (new headline feature)

End-to-end backend slice for ingesting external sources, extracting candidate claims and reviewable action proposals, and exporting approved tasks. Backend contract only — UI lives in consuming projects.

  • Storage (SQLite + Postgres parity): external_sources, source_items, evidence_items, action_proposals, media_retry_queue tables; sensitivity column on source/evidence with allowed values {none, low, medium, high, redacted}
  • 17 new CLI subcommands: import-whatsapp, extract-atlas-claims, propose-actions, action-proposals, resolve-action-proposal, edit-action-proposal, label-source-item, label-evidence-item, enqueue-media-retry, process-media-retry-queue, record-media-retry-outcome, list-media-retries, transcribe-source-item, ocr-source-item, export-actions, atlas-version + init-db brought into the contract
  • 3 new dashboard endpoints: GET /api/action-proposals, POST /api/action-proposals/status, GET /api/atlas/version
  • Versioned contract at v1.5.1 — every Atlas envelope carries meta.atlas_contract_version and meta.atlas_subcommand. Semver enforced; consumers must refuse on major mismatch. Spec: docs/atlas-api-contract-v1.md. Machine-pinned by 41 contract tests.
  • Real provider adapters behind existing Protocols: OpenAIWhisperTranscriptionProvider (stdlib-only urllib + multipart, uses OPENAI_API_KEY + OPENAI_BASE_URL) and TesseractOcrProvider (lazy-import optional dep). Mock providers remain default.
  • Media retry queue for connectors that fetch external URLs (e.g. wacli media). Consumer owns the fetch; MemoryMaster owns durable state. Atomic claim via FOR UPDATE SKIP LOCKED on Postgres.
  • Sensitivity-on-reimport preservation: operator-set labels survive import-whatsapp re-runs.

PRs: #20 (vertical slice) + #27 (contract chain v1.0.0 → v1.5.1 collapsed merge of 7 commits).

Wiki UX (PR #28)

Two patterns borrowed from shannhk/llm-wiki, picked for signal-to-noise:

  • explored: true|false frontmatter on every wiki article. Operator-set human-review marker, distinct from confidence. Default false on new articles; preserved on re-absorb when an operator flipped to true (operator decisions are sticky).
  • Inline > [!contradiction] Obsidian callouts rendered at the top of each article body during wiki-absorb. Detection shared with vault_linter._detect_contradictions so wiki and lint agree.

Skipped: mandatory "Counter-arguments" / "Data gaps" sections — lint-vault already finds these globally.

Audit cleanup batch — 10 F-fixes

The full overnight-audit findings shipped as atomic PRs #10#19:

  • F-1 (#11) llm_steward.py: scope filter uses source claim's scope, not literal "project"
  • F-2 (#10) auto-ingest hook delegates to canonical redact_text filter (no local regex copy)
  • F-3 (#19) verbatim_store.search_verbatim: hybrid-merge dedup keys on row id, not content[:100] (prevented 25 894-row collisions on templated content)
  • F-4 (#15) Sensitivity filter covers role + source_agent + content jointly
  • F-5 (#12) observe() default scope auto-derives from cwd via scope_from_cwd, not literal "project"
  • F-6 (#16) precompact reads autosave marker via auto-ingest's sanitize form (no double-fire)
  • F-7 (#14) Dead memorymaster-observe.py removed
  • F-8 (#13) llm_steward shadow mode treats would_archive as terminal (no double-count)
  • F-9 (#17) compactor distinguishes scope=None from scope='project' (no false aliasing)
  • F-10 (#18) decay records event when claim has future updated_at (no silent skip)

v3.13.x precision: Jaccard dedupe lifecycle

The dedupe mechanism that's now active in production (PRs #1, #4, #5, #6, #7, #8):

  • PR #1 Pre-steward Jaccard dedupe — skip LLM on near-duplicate candidates
  • PR #4 Record would-archive pair details in shadow mode (audit trail before going active)
  • PR #5 v3.13.1: wire dedupe into MemoryService.run_cycle (the actual cron path; was only in standalone CLI)
  • PR #6 v3.13.2: match dedupe scan size to validator (200, not policy_limit)
  • PR #7 scripts/audit_dedupe_precision.py for ongoing precision audits
  • PR #8 Repair broken dedup query (was scanning FTS5 for hash)

Steward hooks promoted dedupe shadow mode → active on 2026年05月01日 after 5/5 spot-check precision on real data.

Windows steward fix (PR #30)

Critical for headless scheduled-task contexts:

  • _call_claude_cli now passes subprocess.CREATE_NO_WINDOW on Windows
  • Without this, when a parent process has no console (pythonw.exe, services), Windows creates a NEW console for every subprocess.run child — producing one popup window per claim during steward LLM-judgment cycles
  • No-op on POSIX (creationflags is Windows-specific)

Codex review cleanup (PR #29)

  • *.db.bak* and *.stackdump added to .gitignore
  • extract_entities.py: hardcoded G:/_OneDrive/... paths replaced with argparse + repo-relative defaults
  • extract_l2_refined.py: regex \n / \v (control chars) corrected to \b (word boundaries) — silently dropping agent coordination and vector search matches

Other

  • PR #28 wiki UX as above
  • PR #9 precompact bypasses block when auto-ingest fired recently (avoids race)
  • PR #2 lazy-import numpy so package imports without [ml] extra
  • PR #3 enable v3.13 dedupe in shadow mode by default (now superseded — active mode since 2026年05月01日)

Numbers

v3.12.0 v3.13.0
Tests 1 029 1 953
MCP tools 22 24
CLI subcommands 64 86

Upgrade

pip install --upgrade memorymaster
memorymaster --db memorymaster.db init-db # idempotent — adds new tables/columns

Existing Atlas DBs are forward-migrated automatically (3-phase _ensure_atlas_source_schema: tables → ALTER ADD COLUMN → indexes). LifeAgent and other Atlas consumers should pin to meta.atlas_contract_version >= 1.5.1.

🤖 Generated with Claude Code

Loading

v3.12.0 — Wider GT (top-50) confirms feature null on lexical corpus

27 Apr 22:25
@wolverin0 wolverin0

Choose a tag to compare

Definitive end of the recall-feature track. v3.10 hypothesised the labeled GT was too narrow to detect lift from new candidates. v3.12 tested it: re-labeled 953 prompts against the top-50 candidates per prompt (was top-15).

Result: baseline jumps 0.104 → 0.470 (+0.366), confirming the GT-coverage bottleneck was real. But every v3.9-v3.11 feature is STILL NEGATIVE on the wider GT. F1 −0.001, F6 boost-only −0.005 to −0.013, F5+F8 −0.033 to −0.058. The features don't aport signal at top-5 on this lexically-clean corpus, regardless of label coverage.

Defaults stay at 0.0 / OFF across F1/F5/F6/F8. Machinery + new wider GT ship as infrastructure for future investigations on different corpora.

Top-15 vs top-50 GT comparison

Metric top-15 GT (v3.10/3.11) top-50 GT (v3.12)
Non-empty prompts 248 (26%) 646 (67.8%)
Total label IDs ~750 2,734
Baseline precision@5 0.104 0.470
Baseline MAP@5 0.184 0.568
Baseline hit@5 0.235 0.667

Sweep results vs baseline 0.470

Config precision@5 Δ
F1 W=0.1 0.469 -0.001
F1 W=1.0 0.467 -0.003
F6 boost-only W=0.1 0.465 -0.005
F6 boost-only W=1.0 0.457 -0.013
F5+F8 W=0.1 0.437 -0.033
F5+F8 W=1.0 0.412 -0.058
Combined v3.11 best knobs 0.421 -0.049

Confirmed

  • The GT-coverage hypothesis was REAL: top-15 GT depressed baseline because most labelled-correct claims were below rank 15.
  • The recall-feature hypothesis is REFUTED: features don't help even when labels capture them.

Added

  • artifacts/real-prompts-1000-top50.jsonl + -labels.json: 953 prompts, 646 non-empty (67.8%), 2,734 IDs. Wider GT for future evals.
  • artifacts/label-batches-top50/: raw chunk in/outs.
  • artifacts/recall-measurement-top50-2026年04月27日.md: full sweep + v3.13+ research directions.

v3.13+ research moves AWAY from re-ranking

  1. Real-world recall capture — instrument hook for query+clicked-IDs corpus.
  2. Vector recall — W_VECTOR=0.0 today (no Qdrant). Different signal source.
  3. Compaction/dedup — reduce noise rather than re-rank.

Pure additive — no breaking changes, no schema changes.

PyPI: https://pypi.org/project/memorymaster/3.12.0/

Loading
Previous 1 3
Previous

AltStyle によって変換されたページ (->オリジナル) /