Releases: wolverin0/memorymaster

v3.19.0 — Phase 0 hardening (H1+H2+H3+H4)

17 May 19:30

@wolverin0 wolverin0

v3.19.0

39ac4ff

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

v3.19.0 — Phase 0 hardening (H1+H2+H3+H4) Latest

Latest

Highlights

Phase 0 hardening release. Closes all four security/ops gaps from the GPT-5.4 review against docs/ROADMAP.md Phase 0. All four mechanisms ship opt-in by default — zero breaking changes for callers that don't set the new env vars.

H1 (#113) — per-cycle LLM budget caps with reason-coded hard stops + per-provider circuit breaker
H2 (#114) — dashboard HTTP auth (viewer/operator roles) + CSRF + bind-safety refusal
H3 (#115) — webhook HMAC-SHA-256 signing + timestamp + 5-min replay window
H4 (#116) — MCP db/workspace path allowlist + admin-mode bypass

Env vars reference

Env var	Default	Purpose
`MEMORYMASTER_MAX_LLM_CALLS_PER_CYCLE`	0 (unlimited)	H1 cycle call cap
`MEMORYMASTER_MAX_TOKENS_PER_CYCLE`	0 (unlimited)	H1 cycle token cap
`MEMORYMASTER_MAX_PROVIDER_FAILURES_PER_CYCLE`	0 (unlimited)	H1 per-provider breaker
`MEMORYMASTER_DASHBOARD_TOKEN_VIEWER`	unset (legacy)	H2 read-only bearer
`MEMORYMASTER_DASHBOARD_TOKEN_OPERATOR`	unset (legacy)	H2 mutating bearer
`MEMORYMASTER_DASHBOARD_UNSAFE_BIND`	unset (refuse)	H2 non-loopback escape
`MEMORYMASTER_WEBHOOK_SECRET`	unset (no sig)	H3 HMAC signing key
`MEMORYMASTER_MCP_DB_ALLOWLIST`	unset (allow all)	H4 DB path allowlist
`MEMORYMASTER_MCP_WORKSPACE_ALLOWLIST`	unset (allow all)	H4 workspace allowlist
`MEMORYMASTER_MCP_ADMIN_MODE`	unset (enforce)	H4 allowlist bypass

Tests

63 new tests, zero regressions on pre-existing suites.

test_llm_budget.py — 8 tests
test_dashboard_auth.py — 25 tests (19 unit + 6 end-to-end HTTP)
test_webhook_hmac.py — 13 tests
test_mcp_path_policy.py — 17 tests (12 unit + 5 chokepoint integration)

What's next

v3.20.0 — Phase 1 storage discipline (versioned migrations + SQLite/Postgres parity gate)
A1 full LongMemEval-S QA-accuracy publication run — mechanism shipped in v3.18.0 (#109), now safer with H1 budget caps in place

Assets 2

v3.18.0 - LongMemEval-S R@5 0.972, claude_cli judge unlocked

17 May 18:01

@wolverin0 wolverin0

v3.18.0

f9e7b82

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

v3.18.0 - LongMemEval-S R@5 0.972, claude_cli judge unlocked

Highlights

Retrieval R@5 lifted 0.966 → 0.972 (+0.006) with per-question-type weight profiles (PR #110). Single-session-preference bucket alone: 0.80 → 0.90 (+0.10), every other bucket unchanged. First non-NULL retrieval improvement since v3.15.0.

claude_cli judge provider (PR #109) unblocks the full LongMemEval-S QA-accuracy bench for OAuth-only environments — no API keys required.

ROADMAP added (docs/ROADMAP.md) — pivots away from benchmark grinding toward differentiator surfaces (governance UI, MCP observability, postgres parity).

What's in this release

Added

Per-question-type retrieval weight profiles: MEMORYMASTER_RETRIEVAL_PROFILE_<TYPE>=lex,conf,fresh,vec env-var family. Opt-in, isolated — no behavior change when unset.
claude_cli judge provider in tests/bench_longmemeval.py:JudgeClient routing through Claude Code OAuth.

Improved

Single-session-preference bucket R@5: 0.8000 → 0.9000 with the validated SINGLE_SESSION_PREFERENCE=0.10,0.10,0.10,0.70 profile.

Notes

docs/longmemeval-results.md per-bucket table is stale (will refresh on next publication pass)
A1 full 500q overnight publication run deferred

PRs

#109 - feat(bench): claude_cli judge provider (OAuth, no API key required)
#110 - feat(retrieval): per-question-type weight profiles (S3, +0.006 R@5)
#111 - chore(release): v3.18.0

Competitive position

LongMemEval-S R@5 = 0.972 leads the small published set (MemPalace 0.966, agentmemory 0.952).

Assets 2

v3.17.1 - steward auto-ingest hook for daydream

16 May 04:12

@wolverin0 wolverin0

v3.17.1

552f9e4

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

v3.17.1 - steward auto-ingest hook for daydream

Opt-in env-gated hook (MEMORYMASTER_DAYDREAM_INGEST_DIR=/Daydreams) so the existing 6h steward cron also auto-ingests daydream insights. Default OFF. Error-isolated — never breaks the cycle. 4 safety tests pass. See CHANGELOG.md.

Assets 2

v3.17.0 - daydream insights ingest pipeline

16 May 03:28

@wolverin0 wolverin0

v3.17.0

718437f

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

v3.17.0 - daydream insights ingest pipeline

Closes the loop: vault -> daydream -> MemoryMaster candidate claims -> steward validation -> wiki-absorb -> vault. New CLI: python -m memorymaster --db <db> ingest-daydream <vault>/Daydreams. See CHANGELOG.md and docs/daydream-integration.md.

Assets 2

v3.16.0 - S1 architectural unblock + S2 honest-null

14 May 23:51

@wolverin0 wolverin0

v3.16.0

d22dd53

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

v3.16.0 - S1 architectural unblock + S2 honest-null

v3.16.0 ships the retrieval-weights architectural unblock (S1, KEEP) and a documented honest-null on RRF-as-tiebreaker (S2, NULL). R@5 unchanged at 0.966, still leading agentmemory. See CHANGELOG.md.

Assets 2

v3.15.1 - README benchmark chart + v3.16 roadmap

14 May 21:00

@wolverin0 wolverin0

v3.15.1

161246a

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

v3.15.1 - README benchmark chart + v3.16 roadmap

Docs-only release on top of v3.15.0. Adds:

README Benchmarks section with inline SVG chart comparing v3.14, v3.15, and agentmemory
docs/benchmark-longmemeval.svg (chart asset)
docs/v316-roadmap.md with ranked next-step levers (S1-S3 high-leverage, A1-A3 worth running, B1-B3 deferred)

R@5 = 0.966 from v3.15.0 stands. Production retrieval code unchanged.

Assets 2

v3.15.0 - LongMemEval-S R@5 0.894 -> 0.966

14 May 20:43

@wolverin0 wolverin0

v3.15.0

4e0c899

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

v3.15.0 - LongMemEval-S R@5 0.894 -> 0.966

v3.15.0 ships the E01 bench-harness fix (PR #94) that lifted R@5 by +0.072 to 0.966. 5 follow-up experiments documented as honest-null/harm. MemoryMaster now leads agentmemory R@5 + MRR with retrieval-only. See CHANGELOG.md for full breakdown.

Assets 2

v3.14.0 — 29-PR auto-harness session

12 May 21:12

@wolverin0 wolverin0

v3.14.0

5b8531d

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

v3.14.0 — 29-PR auto-harness session

v3.14.0 release notes

29 PRs merged this session (#61-#89):

D-hypothesis bug fixes (from PR #62 scenario explorer)

#65 compactor writes artifact before flipping status to archived
#68 dedup treats object_value mismatch as conflict, not duplicate
#69 compact_summaries redacts claim text via sensitivity filter before LLM egress

Features

#71 dashboard /healthz + /readyz operational endpoints
#73 sensitivity filter broadened (bearer, JWT, DB URIs, AWS keys, GitHub PATs)
#77 vault_linter detects orphan wiki articles
#83 observability prometheus-style counters
#84 CLI --dry-run flag for compact / dedup / decay
#85 snapshot round-trip parity tests
#87 mcp ingest_claim per-source-agent rate limit
#88 federated_query cross-tenant sensitivity safety

Tests

#70 dream-bridge sensitivity filter regression suite
#72 lifecycle supersede ↔ replaced_by symmetry invariant
#75 wiki-absorb idempotency
#78 mcp filter-bypass attempt suite (5 categories)
#79 concurrent supersession race safety
#80 db_merge deterministic conflict resolution by updated_at
#82 decay respects pinned=true

Docs / Audits

#61 STRIDE + OWASP audit (15 findings)
#62 lifecycle edge-case scenario explorer (24 scenarios across 12 dimensions)
#63 Windows + git hooks + codex sandbox troubleshooting playbook
#64 wiki frontmatter compliance audit
#66 ADR for wiki auto-promote on N validations
#67 cross-project federation contract
#74 architecture refresh (module map + data flow)
#76 SQLite ↔ Postgres schema parity audit
#81 MCP tool compliance audit (citation / source_agent / sensitivity)
#86 CLI cookbook (90 subcommands documented)

Quality

#89 ruff lint clean (46 → 11 errors; remaining 11 are intentional E402)

Generated via auto-harness pattern: 7 dispatch waves of 2-5 codex exec processes in parallel git worktrees, ~4 hours wall-clock orchestration time.

Assets 2

v3.13.0 — Atlas Inbox V1 + audit cleanup + wiki UX + Windows steward fix

10 May 22:06

@wolverin0 wolverin0

v3.13.0

2402f6d

v3.13.0 — Atlas Inbox V1 + audit cleanup + wiki UX + Windows steward fix

24 commits, 65 files, +9 862 lines since v3.12.0. Headline: Atlas Inbox V1 ships as a versioned backend contract for downstream consumers (LifeAgent, etc.). Plus the full F-audit cleanup batch, real provider adapters, wiki UX upgrades, and a critical Windows fix.

Atlas Inbox V1 — WhatsApp → claims/actions → Super Productivity (new headline feature)

End-to-end backend slice for ingesting external sources, extracting candidate claims and reviewable action proposals, and exporting approved tasks. Backend contract only — UI lives in consuming projects.

Storage (SQLite + Postgres parity): external_sources, source_items, evidence_items, action_proposals, media_retry_queue tables; sensitivity column on source/evidence with allowed values {none, low, medium, high, redacted}
17 new CLI subcommands: import-whatsapp, extract-atlas-claims, propose-actions, action-proposals, resolve-action-proposal, edit-action-proposal, label-source-item, label-evidence-item, enqueue-media-retry, process-media-retry-queue, record-media-retry-outcome, list-media-retries, transcribe-source-item, ocr-source-item, export-actions, atlas-version + init-db brought into the contract
3 new dashboard endpoints: GET /api/action-proposals, POST /api/action-proposals/status, GET /api/atlas/version
Versioned contract at v1.5.1 — every Atlas envelope carries meta.atlas_contract_version and meta.atlas_subcommand. Semver enforced; consumers must refuse on major mismatch. Spec: docs/atlas-api-contract-v1.md. Machine-pinned by 41 contract tests.
Real provider adapters behind existing Protocols: OpenAIWhisperTranscriptionProvider (stdlib-only urllib + multipart, uses OPENAI_API_KEY + OPENAI_BASE_URL) and TesseractOcrProvider (lazy-import optional dep). Mock providers remain default.
Media retry queue for connectors that fetch external URLs (e.g. wacli media). Consumer owns the fetch; MemoryMaster owns durable state. Atomic claim via FOR UPDATE SKIP LOCKED on Postgres.
Sensitivity-on-reimport preservation: operator-set labels survive import-whatsapp re-runs.

PRs: #20 (vertical slice) + #27 (contract chain v1.0.0 → v1.5.1 collapsed merge of 7 commits).

Wiki UX (PR #28)

Two patterns borrowed from shannhk/llm-wiki, picked for signal-to-noise:

explored: true|false frontmatter on every wiki article. Operator-set human-review marker, distinct from confidence. Default false on new articles; preserved on re-absorb when an operator flipped to true (operator decisions are sticky).
Inline > [!contradiction] Obsidian callouts rendered at the top of each article body during wiki-absorb. Detection shared with vault_linter._detect_contradictions so wiki and lint agree.

Skipped: mandatory "Counter-arguments" / "Data gaps" sections — lint-vault already finds these globally.

Audit cleanup batch — 10 F-fixes

The full overnight-audit findings shipped as atomic PRs #10–#19:

F-1 (#11) llm_steward.py: scope filter uses source claim's scope, not literal "project"
F-2 (#10) auto-ingest hook delegates to canonical redact_text filter (no local regex copy)
F-3 (#19) verbatim_store.search_verbatim: hybrid-merge dedup keys on row id, not content[:100] (prevented 25 894-row collisions on templated content)
F-4 (#15) Sensitivity filter covers role + source_agent + content jointly
F-5 (#12) observe() default scope auto-derives from cwd via scope_from_cwd, not literal "project"
F-6 (#16) precompact reads autosave marker via auto-ingest's sanitize form (no double-fire)
F-7 (#14) Dead memorymaster-observe.py removed
F-8 (#13) llm_steward shadow mode treats would_archive as terminal (no double-count)
F-9 (#17) compactor distinguishes scope=None from scope='project' (no false aliasing)
F-10 (#18) decay records event when claim has future updated_at (no silent skip)

v3.13.x precision: Jaccard dedupe lifecycle

The dedupe mechanism that's now active in production (PRs #1, #4, #5, #6, #7, #8):

PR #1 Pre-steward Jaccard dedupe — skip LLM on near-duplicate candidates
PR #4 Record would-archive pair details in shadow mode (audit trail before going active)
PR #5 v3.13.1: wire dedupe into MemoryService.run_cycle (the actual cron path; was only in standalone CLI)
PR #6 v3.13.2: match dedupe scan size to validator (200, not policy_limit)
PR #7 scripts/audit_dedupe_precision.py for ongoing precision audits
PR #8 Repair broken dedup query (was scanning FTS5 for hash)

Steward hooks promoted dedupe shadow mode → active on 2026年05月01日 after 5/5 spot-check precision on real data.

Windows steward fix (PR #30)

Critical for headless scheduled-task contexts:

_call_claude_cli now passes subprocess.CREATE_NO_WINDOW on Windows
Without this, when a parent process has no console (pythonw.exe, services), Windows creates a NEW console for every subprocess.run child — producing one popup window per claim during steward LLM-judgment cycles
No-op on POSIX (creationflags is Windows-specific)

Codex review cleanup (PR #29)

*.db.bak* and *.stackdump added to .gitignore
extract_entities.py: hardcoded G:/_OneDrive/... paths replaced with argparse + repo-relative defaults
extract_l2_refined.py: regex \n / \v (control chars) corrected to \b (word boundaries) — silently dropping agent coordination and vector search matches

Other

PR #28 wiki UX as above
PR #9 precompact bypasses block when auto-ingest fired recently (avoids race)
PR #2 lazy-import numpy so package imports without [ml] extra
PR #3 enable v3.13 dedupe in shadow mode by default (now superseded — active mode since 2026年05月01日)

Numbers

v3.12.0	v3.13.0
Tests	1 029	1 953
MCP tools	22	24
CLI subcommands	64	86

Upgrade

pip install --upgrade memorymaster
memorymaster --db memorymaster.db init-db # idempotent — adds new tables/columns

Existing Atlas DBs are forward-migrated automatically (3-phase _ensure_atlas_source_schema: tables → ALTER ADD COLUMN → indexes). LifeAgent and other Atlas consumers should pin to meta.atlas_contract_version >= 1.5.1.

🤖 Generated with Claude Code

Assets 2

v3.12.0 — Wider GT (top-50) confirms feature null on lexical corpus

27 Apr 22:25

@wolverin0 wolverin0

v3.12.0

e75fea8

v3.12.0 — Wider GT (top-50) confirms feature null on lexical corpus

Definitive end of the recall-feature track. v3.10 hypothesised the labeled GT was too narrow to detect lift from new candidates. v3.12 tested it: re-labeled 953 prompts against the top-50 candidates per prompt (was top-15).

Result: baseline jumps 0.104 → 0.470 (+0.366), confirming the GT-coverage bottleneck was real. But every v3.9-v3.11 feature is STILL NEGATIVE on the wider GT. F1 −0.001, F6 boost-only −0.005 to −0.013, F5+F8 −0.033 to −0.058. The features don't aport signal at top-5 on this lexically-clean corpus, regardless of label coverage.

Defaults stay at 0.0 / OFF across F1/F5/F6/F8. Machinery + new wider GT ship as infrastructure for future investigations on different corpora.

Top-15 vs top-50 GT comparison

Metric	top-15 GT (v3.10/3.11)	top-50 GT (v3.12)
Non-empty prompts	248 (26%)	646 (67.8%)
Total label IDs	~750	2,734
Baseline precision@5	0.104	0.470
Baseline MAP@5	0.184	0.568
Baseline hit@5	0.235	0.667

Sweep results vs baseline 0.470

Config	precision@5	Δ
F1 W=0.1	0.469	-0.001
F1 W=1.0	0.467	-0.003
F6 boost-only W=0.1	0.465	-0.005
F6 boost-only W=1.0	0.457	-0.013
F5+F8 W=0.1	0.437	-0.033
F5+F8 W=1.0	0.412	-0.058
Combined v3.11 best knobs	0.421	-0.049

Confirmed

The GT-coverage hypothesis was REAL: top-15 GT depressed baseline because most labelled-correct claims were below rank 15.
The recall-feature hypothesis is REFUTED: features don't help even when labels capture them.

Added

artifacts/real-prompts-1000-top50.jsonl + -labels.json: 953 prompts, 646 non-empty (67.8%), 2,734 IDs. Wider GT for future evals.
artifacts/label-batches-top50/: raw chunk in/outs.
artifacts/recall-measurement-top50-2026年04月27日.md: full sweep + v3.13+ research directions.

v3.13+ research moves AWAY from re-ranking

Real-world recall capture — instrument hook for query+clicked-IDs corpus.
Vector recall — W_VECTOR=0.0 today (no Qdrant). Different signal source.
Compaction/dedup — reduce noise rather than re-rank.

Pure additive — no breaking changes, no schema changes.

PyPI: https://pypi.org/project/memorymaster/3.12.0/

Assets 2

Releases: wolverin0/memorymaster

v3.19.0 — Phase 0 hardening (H1+H2+H3+H4)

Highlights

Env vars reference

Tests

What's next

Uh oh!

v3.18.0 - LongMemEval-S R@5 0.972, claude_cli judge unlocked

Highlights

What's in this release

Added

Improved

Notes

PRs

Competitive position

Uh oh!

v3.17.1 - steward auto-ingest hook for daydream

Uh oh!

v3.17.0 - daydream insights ingest pipeline

Uh oh!

v3.16.0 - S1 architectural unblock + S2 honest-null

Uh oh!

v3.15.1 - README benchmark chart + v3.16 roadmap

Uh oh!

v3.15.0 - LongMemEval-S R@5 0.894 -> 0.966

Uh oh!

v3.14.0 — 29-PR auto-harness session

v3.14.0 release notes

D-hypothesis bug fixes (from PR #62 scenario explorer)

Features

Tests

Docs / Audits

Quality

Uh oh!

v3.13.0 — Atlas Inbox V1 + audit cleanup + wiki UX + Windows steward fix

Atlas Inbox V1 — WhatsApp → claims/actions → Super Productivity (new headline feature)

Wiki UX (PR #28)

Audit cleanup batch — 10 F-fixes

v3.13.x precision: Jaccard dedupe lifecycle

Windows steward fix (PR #30)

Codex review cleanup (PR #29)

Other

Numbers

Upgrade

Uh oh!

v3.12.0 — Wider GT (top-50) confirms feature null on lexical corpus

Top-15 vs top-50 GT comparison

Sweep results vs baseline 0.470

Confirmed

Added

v3.13+ research moves AWAY from re-ranking

Uh oh!