Releases: hypertopos/hypertopos-py

0.8.1

01 Jun 13:23

@karol-kedzia karol-kedzia

0.8.1

7d7e7f8

This commit was signed with the committer’s verified signature.

karol-kedzia Karol Kędzia

GPG key ID: 0A41132BA740580E

Verified

Learn about vigilant mode.

0.8.1 Latest

Latest

hypertopos is a Python library that transforms relational data into a geometric space where structure, anomalies and change become directly observable — without training machine learning models. Every entity gets a coordinate derived from its typed relationships, calibrated against the population. Distance from center reveals anomalies. Proximity reveals similarity. Movement over time reveals drift. The library ships with a CLI for building spheres from YAML configuration, a Python API for programmatic access, and an MCP server (hypertopos-mcp) that exposes the geometry as tools for AI agents. Research-stage project — benchmarks, methodology and reproduction scripts are included.

hypertopos 0.8.1

This release hardens the incremental-ingest and anomaly-geometry paths. Several statistics that were correct on a full build but subtly wrong when entities were added incrementally are now consistent, and a class of silently-inconsistent configurations is refused at build time with a clear error instead of producing a misleading sphere. It also makes one navigation primitive's optional significance filter behave correctly, teaches runtime chain discovery to recognise round-trip cycles, and speeds up the per-append recompute.

Highlights

Population-relative conformal_p on incremental ingest. hypertopos sphere ingest and GDSBuilder.incremental_update now write a whole-population conformal p-value for ingested entities, matching a full build. Previously the value was computed only against the ingest batch and with inverted polarity, which broke the conformal_p <= alpha stability gates that assess_anomaly_certainty and composite_risk rely on.
Complete {line, key} polygon filtering. event_polygons_for and the get_event_polygons tool now return every polygon where the filtered key occupies any of a line's relation slots. When a line was referenced by more than one relation (an account that is both sender and receiver of a transaction), the filter previously matched only the first slot and silently dropped the rest.
π7_attract_hub significance filter fixed. Its optional fdr_alpha filter now keeps the most-connected hubs — matching π7_attract_hub_and_stats — instead of inverting the ranking and returning the least-connected hubs, or nothing, at a small alpha.
discover_chains detects cycles. Round-trip chains (A→...→A) are now surfaced with is_cyclic = true; they were previously dropped by the no-revisit guard, so the cyclic count was structurally always zero.
Silent-corruption guards on the build / ingest path. incremental_update now raises a clear error for patterns calibrated per group, per cluster (GMM), or with an FDR hierarchy (rebuild instead), and for points tables that omit a declared edge-dim-aggregation or event-dimension column. GDSBuilder.build refuses a pattern that declares both tracked_properties and an edge-dimension block. Each of these previously produced a silently inconsistent sphere.
Parameter-aware topological anomaly cache. find_topological_anomalies now keys its cache by its scoring parameters, so a re-run with a different k_neighbors / pca_dim / sample_size re-scores instead of returning the previous result.
Faster incremental ingest via a single-pass population-statistic recompute, plus two ergonomic API changes (edge_potential opt-in ranking, a simulate_edge_removal truncation envelope). Data files whose name contains a period before the extension (e.g. q1.2024.parquet) are no longer rejected as an unsupported format.

See CHANGELOG.md for the full list of changes.

Assets 2

0.8.0

30 May 08:53

@karol-kedzia karol-kedzia

0.8.0

84c4cc8

This commit was signed with the committer’s verified signature.

karol-kedzia Karol Kędzia

GPG key ID: 0A41132BA740580E

Verified

Learn about vigilant mode.

0.8.0

hypertopos-py 0.8.0 — incremental ingest, storage upgrade, and cloud-ops tooling

0.8.0 turns a built sphere into a living, operable artifact. The headline is production-ready incremental ingest: you can add, modify, and delete entities on an existing sphere without a full rebuild, and newly added entities are immediately visible to anomaly search — across every pattern kind, including patterns whose dimensions are relations, event dimensions, edge-dimension aggregations, and tracked properties. The release pairs that with a storage upgrade to the Lance 7.x line (native per-tag calibration-epoch timestamps and faster batched point reads), and a cloud-ops command-line surface — sphere health, sphere validate, and sphere diff — that lets a CI pipeline gate a deployment on a sphere's geometric health, structural integrity, and calibration drift versus the previous build. Incremental ingest works on the existing sphere format, so there is no migration and no breaking change.

Highlights

Production-ready incremental ingest — GDSBuilder.incremental_update adds, modifies, and deletes entities on an existing sphere without a full rebuild; newly added entities are immediately visible to anomaly search. Patterns whose dimensions are relations, event dimensions, edge-dimension aggregations, and tracked properties are now fully supported (previously such patterns raised a width-mismatch error). Pass reindex=True to fold the appended rows into the ANN vector index immediately. For batched ingestion of many small appends, pass recompute_ranks=False to defer the population percentile recompute, then call GDSBuilder.finalize_incremental once at the end of the session.
hypertopos sphere ingest <sphere> --points <file> — the incremental-ingest CLI: append a new-/changed-entities table (Arrow IPC, Parquet, or CSV) to one pattern's geometry without a full rebuild. --pattern selects the target pattern (optional when the sphere has a single one), --reindex forces an immediate ANN index rebuild, --finalize recomputes the global rank percentile and rebuilds the index once at the end of a batched session, and --json emits a machine-readable summary (added, modified, deleted, population_size, drift_pct, geometry versions before/after, reindexed, finalized) for pipeline consumption.
Lance 7.x storage upgrade — the pylance dependency floor moves to the 7.x line. Per-tag calibration-epoch timestamps now resolve directly from the dataset tag listing (a single lookup) instead of scanning version history, with a read-side fallback for spheres written by older tooling. Batched point reads issue a single indexed lookup over the whole key set on a reused dataset handle, so repeated lookups of the same entities are substantially faster. Existing spheres open transparently.
Faster builds for small patterns — the builder skips the trajectory ANN index when a pattern has too few rows to train it. attract_trajectory (π10) and find_drifting_similar still return correct nearest-trajectory results via an exact full-scan fallback when no index is present.
hypertopos sphere health <path> — composes the population summary with the geometric health checks into a single status (ok / warning / critical). --exit-code-on-critical exits 2 on a critical status so a CI shell gate can fail a deploy; --json emits a {status, sphere_path, overview, alerts} document.
hypertopos sphere validate <path> [--strict] [--json] — structural integrity check over a built sphere: sphere.json parses, each materialized line has a points directory, each pattern has a geometry directory. --strict promotes calibration-health and dimension-quality warnings to errors, exiting 1 when the sphere is invalid.
hypertopos sphere diff <old> <new> [--json] — pre-deploy diff reporting the pattern-inventory delta (added / removed / common) and per-shared-pattern calibration drift between two built spheres. Patterns whose calibration schema differs are marked not_comparable.

Breaking changes

None. Incremental ingest and the new CLI commands are additive, and they operate on the existing sphere format — no migration is required.

See CHANGELOG.md for the full list of changes.

Assets 2

0.7.3

27 May 16:29

@karol-kedzia karol-kedzia

0.7.3

16c5368

This commit was signed with the committer’s verified signature.

karol-kedzia Karol Kędzia

GPG key ID: 0A41132BA740580E

Verified

Learn about vigilant mode.

0.7.3

hypertopos-py 0.7.3 — detector-composition continuation + investigation orchestrator surface

0.7.3 extends the 0.7.x theme — detector composition and counterfactual primitives — with eight clusters that close gaps in detection-quality signals, chain reliability, and the investigation orchestrator surface. The release adds per-dimension AUROC and intrinsic/extrinsic delta decomposition on label-aware patterns, a chain-level reliability rollup that propagates per-member signed confidence into the chain investigation summary, a pattern-level signed_tail_concentration warning on the Fisher LDA-projected delta distribution, opt-in sampling and threshold-boundary stratification for find_anomalies, a deterministic 4-category DTW trajectory classifier, Louvain community_id as a graph-feature primitive (with pure-numpy fallbacks for PageRank and connected components), auto-discovery on find_calibration_influencers with a temporal impact cache, a near_data_boundary flag on regime-change detections, plus three new agent-facing ergonomics on passive_scan, find_similar_entities, and dive_solid. No sphere-format bump; no breaking changes.

Highlights

audit_pattern_dims auroc_per_dim per-row + intrinsic_displacement_mean / extrinsic_displacement_mean pattern-level fields — closed-form Gaussian-approximation AUROC Phi((mu_pos - mu_neg) / sqrt(sigma_pos^2 + sigma_neg^2)) per dimension; pattern-level intrinsic / extrinsic decomposition means along and orthogonal to the Fisher LDA direction. Populated only when label-aware calibration fires; legacy spheres receive null.
signed_tail_concentration dim-quality warning — fires when |p99| / max(|p50|, 1e-9) > 50 on the Fisher LDA-projected delta distribution. Indicates the label-discriminating axis is being driven by a tiny outlier subgroup rather than a broad class split. Suppressed when Pattern.label_aware_n_pos < 30 (positive-class undersampled).
chain_signed_confidence_rollup + chain_full_loop_summary reliability fields — chain-level reliability triage derived from per-member signed_confidence ranking. Reports chain_mean_signed_confidence, chain_n_low_confidence_members, chain_n_single_dim_driven_members, and chain_confidence_verdict ∈ {high, medium, low, label-aware-unavailable}. Verdict "low" subtracts 10 from the overall investigation score in the orchestrator.
find_anomalies(..., sample_size, boundary_aware) — opt-in sampling for the in-process scan path. boundary_aware=True allocates half the budget around [0.8 * theta_norm, 1.2 * theta_norm] to surface threshold-boundary cases for calibration audits; the other half is drawn from the rest of the population. Defaults preserve full-scan behaviour.
engine.topology.classify_trajectory + Navigator.classify_trajectory — deterministic 4-category trajectory classifier (outlier / lagging / leading / typical) over DTW distance against the population-median trajectory combined with a first-derivative slope comparison. Per-entity and per-population variants.
engine.topology.local_trajectory_shape — pure-function classifier for an entity's own delta_norm time-series (arch / V / linear / flat). No population reference required. Used by dive_solid to surface trajectory shape on the entity's temporal slices when ≥3 slices exist.
sphere_overview ergonomics — top-level cross_pattern_discrepancy block lists pairwise Jaccard overlap of anomalous primary_keys across cover-overlapping patterns; per-entity aggregate rows gain anomaly_rate. The response is now an object with patterns (the prior per-pattern list, preserved verbatim) and the new cross-pattern block. Clients indexing the response as a list must read response["patterns"] instead.
graph_features.features: community_id — Louvain modularity-optimizing community detection on the anchor-pattern graph. igraph.Graph.community_multilevel when igraph is installed, networkx.algorithms.community.louvain_communities otherwise; gracefully emits an empty mapping when neither backend is available. Community IDs renumbered so the largest community is 0.
Pure-numpy pagerank and connected_component graph features — graph_features.features: for these two now routes through pure-NumPy helpers (damped PageRank, Union-Find connected components). No external graph library required; igraph remains optional for the other features.
find_calibration_influencers auto-discovery + temporal impact cache — auto_discover=True runs k-medoids over the population to surface the top-K influencer candidates without an explicit primary_key list. A new build-time cache persists per-epoch μ-impact and Δ-norm-impact history for known influencers.
calibration_influencer_history(primary_key, pattern_id) — agent-facing read of the temporal impact cache. Returns a chronological list of {epoch, calibrated_at, mu_impact, delta_norm_impact} entries.
π12_attract_regime_change near_data_boundary flag — every detected changepoint now carries near_data_boundary: bool. Fires True when the changepoint timestamp index falls in the first two or last two temporal buckets of the filtered range — agents can suppress these as low-confidence detections sitting on insufficient one-sided context.
Navigator.find_similar_entities(..., with_neighbor_anomaly=False) — opt-in flag populates SimilarityResult.is_anomaly_map with stored anomaly flags for the returned top-N neighbours. First lookup per (pattern, version) per session pays a one-time full-table scan; subsequent calls are O(k) dict reads, so warm-path callers see no measurable overhead.

Breaking changes

sphere_overview MCP response shape change: it is now an object with patterns (the prior per-pattern list, byte-identical) and a top-level cross_pattern_discrepancy block. Clients that previously indexed the top-level response as a list must read response["patterns"] instead. All other 0.7.3 surfaces are additive.

Infrastructure

engine.graph module — pure-numpy compute_pagerank, compute_louvain_community, compute_connected_component helpers, reusable outside the builder path.

See CHANGELOG.md for the full list of changes.

Assets 2

0.7.2

21 May 11:51

@karol-kedzia karol-kedzia

0.7.2

a3ae942

This commit was signed with the committer’s verified signature.

karol-kedzia Karol Kędzia

GPG key ID: 0A41132BA740580E

Verified

Learn about vigilant mode.

0.7.2

hypertopos-py 0.7.2 — composition orchestrators + signed-confidence ranking

0.7.2 turns the label-aware calibration foundation from 0.7.1 into operational primitives that agents can compose. The release ships the strategic-gap-analysis §5 0.7.2-row headline features: signed-confidence ranking that fuses three already-shipped signals (delta_norm_signed + Fisher LDA direction + reliability_flags) into one confidence-weighted score, and chain_full_loop_summary — the chain-side mirror of investigate_entity that collapses six manual primitives into one MCP call. Two new label-aware dim auditors land: audit_label_alignment reports Fisher LDA direction discrimination as AUROC against labels with the top-N most label-discriminating dims, and a new kind_mismatch warning class on sphere_overview.dim_quality_warnings flags gaussian-declared dims whose Fisher direction is near-zero while raw class moments separate — surfacing dims whose variance is captured by another dim's axis. A counterfactual frozen-population trajectory primitive on dive_solid answers "is this entity's apparent shift a real change or just population drift around a stationary entity?" — a stationary entity yields delta_norm_frozen_pop = 0 across all slices. Chain anomaly signal gains edge_potentials per consecutive-pair hop — Euclidean distance between endpoint delta vectors that surfaces high-distance behavioural jumps inside multi-hop chains. Infrastructure: a new changelog.d/ fragment convention at repo root eliminates the recurring [Unreleased] merge-conflict class on every sibling PR.

Highlights

find_anomalies(rank_by="signed_confidence") — confidence-weighted anomaly ranking: score = delta_norm_signed ×ばつ |lda_alignment| ×ばつ (1 − reliability_penalty) where reliability_penalty = 0.5 ×ばつ single_dim_driven + 0.5 ×ばつ low_confidence_bucket. Survivors carry signed_confidence_score, lda_alignment, reliability_penalty per polygon. Fail-fast on patterns without label_aware_calibration (no silent fallback to delta_norm).
chain_full_loop_summary(chain_id, chain_pattern_id, anchor_pattern_id, ...) — chain-side investigation orchestrator. Mirror of investigate_entity. Composes seven chain primitives into one call with per-step {ok, error} envelopes. Top-level summary reports investigation_strength ∈ {strong, moderate, weak}, recommended_action, derived score, one-sentence rationale.
GDSNavigator.audit_label_alignment(pattern_id, *, top_n=10) — Fisher LDA direction AUROC + top-N most label-discriminating dims by |direction_component|. Sibling to audit_pattern_dims. Fallback shape with auroc: null on patterns without label-aware calibration.
kind_mismatch warning on sphere_overview.dim_quality_warnings — fires when a kind='gaussian' dim shows |direction_component| < 0.05 AND cohens_d_pos_neg >= 0.3. The dim's variance is captured by another dim's Fisher axis — re-declaring kind or splitting recommended. Suppressed for dims already flagged negative_space.
recommended_action="kind_mismatch_review" — new categorical value on audit_pattern_dims decision tree, highest priority (preempts keep / split / drop_low_separation / investigate_drift).
dive_solid(..., counterfactual_frozen_population=True) — each SolidSlice carries delta_norm_frozen_pop reporting per-slice L2 norm against the FIRST slice's raw shape. Answers the stationary-entity-vs-population-drift question. Default False preserves return shape.
Chain.to_dict() edge_potentials — per-consecutive-pair-hop Euclidean distance between endpoint delta vectors against a caller-supplied anchor pattern. null on missing polygons or non-finite distances. Length len(keys) − 1; empty for single-entity chains.
engine.counterfactual.recompute_delta_norm_against_frozen(shape, mu_frozen, sigma) — pure-NumPy helper for the frozen-trajectory path.
SolidSlice.delta_norm_frozen_pop — new optional model field, populated by the frozen-trajectory path.
changelog.d/ fragments convention — per-PR changelog.d/<pkg>/<id>.<type>.md files at repo root; aggregator scripts/release/aggregate_changelog.py <version> folds at tag time. Eliminates the recurring [Unreleased] merge-conflict class. Outside packages/; mirrors never see fragments. See changelog.d/README.md.
docs/refutations.md — five tested-and-closed hypotheses (σ-estimator swap ×ばつ 2, lazy-chain sampled calibration, Bregman-based is_anomaly flag, cycle-compression methodology). Each entry: hypothesis, dataset, numeric verdict, root cause, closure rule.
docs/patent-implementation-map.md — nine architectural-claim clusters mapped to their shipped surfaces. Descriptive cluster names only; no internal tracker IDs.

Builder + label-aware activation

AML HI-small + LI-small spheres ship with label_audit: declared on their tx_pattern — is_laundering column on transactions activates the M1 builder hook from 0.7.1 (#511) on rebuild. Sphere format stamps 3.1 when the block is declared. audit_pattern_dims returns the full per-class field path (mu_pos / sigma_pos / mu_neg / sigma_neg / cohens_d_pos_neg / direction_component) instead of the fallback shape.

Breaking changes

None. delta_norm_frozen_pop, edge_potentials, kind_mismatch warning, recommended_action: "kind_mismatch_review", rank_by="signed_confidence" — all additive surfaces. Chain.to_dict() gains a new optional kwarg delta_by_key with None default preserving existing behaviour.

Infrastructure

README badge for CI test workflow on the public mirror.

See CHANGELOG.md for the full list of changes.

Assets 2

0.7.1

21 May 05:57

@karol-kedzia karol-kedzia

0.7.1

603878f

This commit was signed with the committer’s verified signature.

karol-kedzia Karol Kędzia

GPG key ID: 0A41132BA740580E

Verified

Learn about vigilant mode.

0.7.1

hypertopos-py 0.7.1 — label-aware calibration close-out + cycle hygiene

0.7.1 closes the label-aware calibration arc opened in 0.7.0 (sphere format 3.0 + saturation of investigator-side primitives) and lands the cycle hygiene work that the 0.7.0 retrospective flagged. The major moves are: engine.calibration_label_aware now runs end-to-end when a sphere.yaml declares a label_audit: block and the --label-aware-calibration build flag is set — per-dim class-conditioned moments + Fisher LDA direction land on Pattern.label_aware_calibration and the delta_norm_signed Lance column populates with the signed projection. The audit_pattern_dims MCP tool returns the full per-dim path (mu / sigma / class-conditioned moments / Cohen's d / direction component) on rebuilt spheres. Two new build-time diagnostics — per-dim normality (Shapiro-Wilk / KS) and per-pattern heteroscedasticity (Brown-Forsythe Levene) — surface as new dim_quality_warnings types (non_normal_dim, heteroscedasticity) on sphere_overview and tell agents which dims are statistically trustworthy z-scores and which groupings violate the global-θ assumption. Cycle hygiene: a GitHub Actions test workflow, a tool-count consistency test, a conftest restore-on-MagicMock fixture, Pattern.dim_index() now covers aggregated edge dims, dim_percentiles covers them too, and the event-pattern dim_labels undercount that the 0.7.0 retrospective flagged is fully closed. One additive: python -m hypertopos_mcp.main --transport http --port 8080 runs the MCP server over the streamable-HTTP transport (transport-only flag; no auth / TLS / multi-tenant — those land in 0.8).

Highlights

Label-aware calibration end-to-end — sphere.yaml accepts a top-level label_audit: block (label_column, label_positive_value, patterns: [...]); builder runs calibrate_label_aware per listed pattern when the --label-aware-calibration flag is set; Pattern.label_aware_calibration persists per-dim {mu_pos, sigma_pos, mu_neg, sigma_neg, direction}. Sphere format minor 3.1 is stamped only when the block is declared; legacy 3.0 spheres load unchanged.
delta_norm_signed Lance column — nullable float32 per-polygon signed projection of the delta vector onto the label-aware Fisher LDA direction. Positive values mean the polygon is pushed toward the positive-labelled centroid; negative toward the other class. All-null on patterns without label-aware calibration.
audit_pattern_dims MCP tool — per-dim calibration audit reporting raw mu / sigma alongside class-conditioned moments + Cohen's d + Fisher direction component + categorical recommended_action ∈ {keep, split, drop_low_separation, investigate_drift, kind_mismatch_review}. Full-field path activates on label_audit:-equipped rebuilt spheres; otherwise falls back to raw stats only.
Per-dim normality diagnostic — engine.dim_audit.normality_test_per_dim runs Shapiro-Wilk (N ≤ 5000) / Kolmogorov-Smirnov (larger N) per gaussian-declared dim at build time. Fires non_normal_dim warning on sphere_overview.dim_quality_warnings when p < 0.01 — recommends log1p / sqrt / rank transform or kind re-declaration. Bernoulli and poisson dims silently skipped. Suppressed when negative_space already fires on the same dim.
Per-pattern heteroscedasticity diagnostic — engine.diagnostics.levene_test_per_group runs Brown-Forsythe (median-centred) Levene test at build time on patterns carrying group_by_property. Fires heteroscedasticity warning on sphere_overview.dim_quality_warnings when p < 0.01 — confirms the global θ / pooled-σ / global-percentile assumptions are statistically violated for the grouping.
Pattern.dim_index() covers aggregated edge dims — every {source_dim}_{aggregate} label declared via edge_dim_aggregations now resolves through dim_index. Dissolves the local dim_labels.index() fallback shipped in 0.7.0 PR #487's simulate_dimension_change.
dim_percentiles covers aggregated edge dims — same six-key entry (min / p25 / p50 / p75 / p99 / max) that the cache already provided for event_dims and prop_cols. Consumers (sphere_overview profiling, dominant_dim_mass / negative_space auditors, find_anomalies percentile path) see aggregated edge dims uniformly.
Conformance YAML loader — sphere.yaml patterns now accept a conformance_rules: block (AST predicate language: and / or / not over == / != / < / <= / > / >= / in, severity ∈ {low, medium, high, critical}). Builder validates against entity-line points table, attaches to Pattern.conformance_rules, and writes violations to the existing sidecar. Closes the runtime-evaluator + YAML-loader gap that was deferred in 0.7.0.
Event-pattern dim_labels undercount fix — Pattern.delta_dim() now matches the stored mu/sigma/theta width for event patterns by surfacing edge-derived dim names (pair_edge_count, position_in_chain, time_since_pair_last_edge, pair_amount_zscore, find_motif_structuring) via Pattern.edge_dim_names. Removes the anomaly_summary band-aid and tightens every zip(pattern.dim_labels, ...) callsite. Rebuild required to surface labels on existing spheres.
HTTP MCP transport flag — python -m hypertopos_mcp.main --transport http --port 8080 runs the MCP server over the streamable-HTTP transport (single-client localhost). Default --transport stdio unchanged. Auth, TLS, session-keyed _state for true multi-tenant: 0.8 candidates.
CI workflow — .github/workflows/test.yml runs pytest on Python 3.12 / Ubuntu on every PR and push to main. Hardens conftest against MagicMock leak through pytest.raises; locks tool-count consistency across mcp-spec.md, mcp-lifecycle.svg, README, and runtime registry.

Breaking changes

None. delta_norm_signed is a new nullable column; label_audit: is a new optional top-level YAML block; format-3.0 spheres continue to load unchanged.

Dependency completeness fix

Three production modules (engine.fdr, engine.density_gaps, navigation.navigator.cluster_bridges) import scipy / pandas which had been arriving transitively via numpy / pylance on local dev machines but were not declared in pyproject.toml. Clean Ubuntu CI install on the new workflow surfaced this; deps added as production dependencies. scikit-learn added to the [topology] optional extra alongside ripser / persim. pytest-asyncio added to [dev].

See CHANGELOG.md for the full list of changes.

Assets 2

0.7.0

20 May 08:42

@karol-kedzia karol-kedzia

0.7.0

a62a64a

This commit was signed with the committer’s verified signature.

karol-kedzia Karol Kędzia

GPG key ID: 0A41132BA740580E

Verified

Learn about vigilant mode.

0.7.0

hypertopos-py 0.7.0 — sphere-format-3.0 + investigation surface saturation

0.7.0 modernises the sphere storage layer onto Lance 6.0 with native MVCC and bumps the on-disk sphere format from 2.4 to 3.0; geometry is now stored as a single Lance dataset per pattern with calibration epochs as tagged versions instead of one directory per epoch. Layered on top of the new storage substrate, the release saturates the investigation surface across both the entity and chain sides: an investigate_entity orchestrator mirrors the investigate_chain orchestrator shipped previously, a counterfactual suite (per-edge, joint-edge, per-counterparty, per-dimension overrides) gives AML and fraud investigators the four lenses they need to attribute anomaly to coordinated structure, a multi-hypothesis explainer surfaces K disjoint dim sets when an anomaly has more than one plausible cause, reliability triage flags ride along on five primitive outputs so single-dim-driven and low-confidence-bucket entities are auditable without scrolling the calibration log, two chain extensions diagnose coordinated witness sets and chain-level drift trajectories, the FDR engine grows multi-resolution spatial ×ばつ temporal hierarchies and a per-dim axis so a single dominant dimension cannot inflate the global threshold for unrelated dimensions, a persistent-homology stack ranks entities by local H_1 cycle persistence with a sidecar Lance cache, a behavioural-vs-graph cross-tabulation flags entities transacting outside their peer group, two sphere-validation auditors (dominant_dim_mass, negative_space) fire from sphere_overview on the cached pattern state, and a declarative compliance rules family lets a YAML predicate language attach to a pattern and surface rule-break entities as a sidecar Lance dataset.

Highlights

Sphere format 3.0 + Lance 6.0 modernization (breaking) — native MVCC for sphere geometry, calibration epochs tagged as Lance dataset versions, Lance dependency bumped 4.0.0 → 6.0.0 with auto-applied wins on every cold path (SIMD distance kernels, zonemap segment improvements, manifest interning, eager I/O scheduling, transaction-file cleanup on failed builder commits, FTS prewarm position-codec preservation).
Entity-side investigation orchestrator — investigate_entity mirrors investigate_chain from the previous release; chains polygon shape lookup, explain_anomaly, find_witness_cohort, find_chains_for_entity, trace_root_cause, find_graph_geometry_tension, and an opt-in simulate_edge_removal block behind one safe-call envelope so a partial failure surfaces in steps_status instead of aborting the whole investigation.
Counterfactual suite — simulate_edge_removal (per-edge with per-source-dim ECDF significance), simulate_counterparty_removal (per-partner rollup for the high-degree-entity flat-ranking gap), select_minimal_joint_edge_removal (greedy joint counterfactual that reveals coordinated edge groups single-edge counterfactuals cannot detect), and simulate_dimension_change (raw shape-vector dim override → delta_norm recompute, companion to simulate_edge_removal for non-edge dimensions).
Multi-hypothesis explainer — find_diverse_explanations returns K strict-disjoint hypotheses for why an entity is anomalous via greedy selection over per-dim Bregman contributions, with graceful degradation (degraded_reason="insufficient_diverse_mass") when an entity is genuinely single-dim driven.
Reliability triage — reliability_flags field on five primitive surfaces (π5_attract_anomaly, explain_anomaly, composite_risk, combine_anomaly_pvalues, investigate_entity); flags single_dim_driven (one dim contributes more than 70% of total anomaly attribution) and low_confidence_bucket (bootstrap-derived anomaly_confidence below 0.5).
Chain extensions — chain_witness_intersection (coordinated-witness diagnosis — coordinated=True when chain members share an anomaly mechanism) and chain_drift_trajectory (per-member time-bucketed regime with a chain-level rollup, spots chains jointly drifting toward anomaly before any single hop crosses the threshold).
FDR upgrades — multi-resolution spatial ×ばつ temporal hierarchies via Pattern.fdr_hierarchy / Pattern.fdr_temporal_hierarchy YAML schema with fdr_resolution / fdr_temporal_resolution parameters on find_anomalies; per-dim FDR axis (fdr_axis: "entity" | "per_dim" | "both") with rank_by="min_q_per_dim" re-ranking so a rare-but-real single-dim signal surfaces ahead of joint-norm winners; builder.temporal_bucket materialises the per-anchor centroid-timestamp slice dimension when a temporal hierarchy declares one not already present in geometry.
Persistent-homology stack — find_topological_anomalies per-entity local k-NN Vietoris–Rips H_1 cycle-persistence ranking with sidecar Lance cache (force=True to recompute); trajectory companion find_topological_trajectory_anomalies distinguishes closed-loop trajectories from monotonic drift and stationary clusters. Optional dependency: pip install hypertopos[topology].
Detector composition — Wilson harmonic-mean p-value (HMP) replaces Fisher's method for cross-pattern composition in composite_risk / composite_risk_batch; combine_anomaly_pvalues calibrates five orthogonal detectors (delta_norm, neighbor_contamination, segment_shift, trajectory_continuous, density_gap) to per-entity p-values via the new engine.p_value_calibration adapter family and combines them via HMP; classify_detector_consensus surfaces the categorical pattern of agreement (mixed_signal / anomalous_consensus / single_detector_signal / normal_consensus / insufficient_data) instead of collapsing to one scalar.
Graph-geometry primitives — find_graph_geometry_tension cross-tabulates behavioural k-NN against graph adjacency, surfacing hidden_cluster (similar but disconnected) and suspicious_links (connected but dissimilar) cells that scalar anomaly detectors cannot separate; edge_curvature_frc materialises combinatorial Forman-Ricci curvature as an opt-in build-time edge dimension; engine.dim_audit.fit_lda_direction closes the dim-audit primitive set with a Fisher LDA direction fit over labeled delta vectors.
Sphere-validation cheap-tier — dominant_dim_mass (pattern-level single-dim dominance flag — fires when one dim accounts for ≥70% of population p99-tail variance) and negative_space (gaussian-declared zero-centered dim flag — fires on gaussian dims whose empirical p50==0) extend sphere_overview.dim_quality_warnings. Both compute from cached pattern state — sub-millisecond, no storage scan, no new sphere format field.
Declarative compliance rules — Pattern.conformance_rules attaches an AST predicate language (no eval(), intentionally minimal) to a pattern; the builder evaluates rules vectorized via PyArrow expressions at build time and persists violations to a sidecar Lance dataset; find_conformance_violations reads it back with filter pushdown.

Fixes and hardening

Hub-scoring shape mismatch — π7_attract_hub_and_stats and hub_score_history now slice the shape matrix to len(pattern.edge_max) before the per-relation multiply on anchor patterns whose delta_dim exceeds the relation count (the conformance / motif dim tail at the end of pattern.delta). find_hubs and hub_history are now callable on these patterns without line_id_filter as a workaround.
Counterfactual hub safety — simulate_edge_removal gains max_edges_loaded (default 2000) that truncates the candidate edge list before the sidecar IN-clause and the engine evaluation; select_minimal_joint_edge_removal gains max_candidates (default 500) capping the greedy search input. Both surface n_candidates_seen / n_candidates_used / candidates_truncated so the agent sees when a hub entity's adjacency was capped. Per-call latency on entities with tens of thousands of edges drops from indefinite to seconds.
Motif anchor gate — find_high_potential_motifs and score_motif reject event pattern_id early with a GDSNavigationError that names the anchor companion (best-effort: silently skipped when no anchor companion is configured, e.g. legacy event-only spheres). The previous behaviour silently returned an empty result after burning the full enumeration cost — the geometry's primary_key carries event keys but the adjacency index is keyed on entities, so no seed ever passed the active-seed gate.
Detector-consensus tiebreaker — classify_detector_consensus uses delta_norm (pulled from the reliability flags attached by combine_anomaly_pvalues) as a deterministic final tiebreaker when per-detector p-value vectors saturate at the float floor and the harmonic-mean composition collapses. The previous behaviour returned an order determined by floating-point sort tie-break.
Cross-pattern key-confusion error — edge_potential (backing score_edge / score_motif / anomalous_edges) names the pattern type and the expected key shape in its "entity not found" error and points the agent at search_entities for discovery.

Commit-prefix note

Several commits in this cycle were authore...

Assets 2

0.6.7

10 May 10:32

@karol-kedzia karol-kedzia

0.6.7

5d1ff53

This commit was signed with the committer’s verified signature.

karol-kedzia Karol Kędzia

GPG key ID: 0A41132BA740580E

Verified

Learn about vigilant mode.

0.6.7

hypertopos-py 0.6.7 — close the investigation→SAR pipeline

This release closes the chain investigation→SAR pipeline opened in 0.6.4-0.6.6. The chain-coherent investigative loop primitives shipped earlier (find_chains_with_coherent_anomaly, anomaly_propagation_in_chain, classify_chain_typology, extend_chain) get a triage layer in front and an orchestrator + SAR narrative composer behind, so investigators can go from "open the sphere" to "paste a 3-5 paragraph SAR draft into the regulator filing template" in three MCP calls instead of nine. A new per-dim runtime weight on find_anomalies wires the stratified correlation-gate verdicts shipped in 0.6.6 into the ranked output, so NOISE-classified dims can be silenced or HEAVY-TAIL dims down-weighted at query time. sphere_overview gains a dim_quality_warnings block surfacing two silent build-time failure modes (dead and sparse dims) that previously broke z-score / delta_norm semantics with no agent-visible signal. A new cookbook documents how chains discovered outside hypertopos (SAR typology engines, ERP supply-chain workflows, EHR clinical pathways) ingest as anchor lines and unlock the chain-coherent loop on externally-curated chains.

Highlights

GDSNavigator.π5_attract_anomaly(..., dimension_weights=None) — optional {dim_name: float} mapping that scales each dim's contribution to the rank score before computing delta_norm. Default None leaves behaviour unchanged. Missing dims default to weight 1.0; explicit 0.0 silences a dim. Requires metric in ('L2', 'Linf') — Bregman divergence is precomputed per-row and cannot be reweighted post-hoc. Connects stratified correlation-gate verdicts to runtime ranking.
GDSNavigator.chain_investigation_summary(chain_pattern_id, anchor_pattern_id, *, min_hops=2, max_runs=10000) — pre-investigation triage diagnostic for a chain pattern. Aggregates one find_chains_with_coherent_anomaly sweep + a chain-pattern geometry scan into population-level metrics (coherent_run_rate, cross_pattern_overlap, top_dims_in_coherent_runs, recommended_min_hops). Cost is one coherent-anomaly sweep — what the agent would pay anyway as the first triage step, with the aggregates surfaced for free.
GDSNavigator.investigate_chain(chain_id, pattern_id, *, anchor_pattern_id, extension_max_results=20) — one-shot orchestrator that runs the full R9 investigative loop on a single chain (trace + typology + shape-anomaly lookup + extension forward + extension backward) and aggregates the per-step outputs into a single SAR-ready report. Each per-step output is wrapped in {ok, data | error} so a partial failure does not abort the whole report. Summary derives investigation_strength (strong / moderate / weak) and recommended_action (escalate to SAR / continue investigation / false-positive candidate) with a single-paragraph rationale.
GDSNavigator.generate_sar_rationale(chain_id, pattern_id, *, anchor_pattern_id, evidence=None, regulatory_template="FinCEN SAR") — template-based composition (no LLM call) of a 3-5 paragraph SAR-ready narrative from R9 evidence. Returns sar_narrative (paragraph-separated string), evidence_anchors (structured pointers per claim — investigator can audit every line against source data), and confidence (high / moderate / low). Honesty discipline: language is "evidence indicates" / "the per-hop trace shows" — never "confirms" — and the narrative is positioned as a starting draft for the investigator, not a final verdict. An untriaged guard surfaces "Treat this chain as untriaged, not as cleared" when all R9 surfaces fail, preventing the worst silent-error class in a SAR context.
sphere_overview per-pattern entry gains an optional dim_quality_warnings[] block surfacing two build-time failure modes: dead_dim (sigma_diag < 1e-10 — z-score undefined, the dim contributes nothing meaningful and silently dilutes other dims' signal) and sparse_dim (p50 == 0 AND p99 > 0 — gaussian z-score assumption is wrong; Bregman divergence with poisson / bernoulli kind tag is the correct distance). Each warning carries type, dim_label, reason, and concrete advice. Computed sub-millisecond from cached pattern state, no storage scan.
New documentation page external-chains-as-anchor-line.md covering the workflow for ingesting chains discovered outside hypertopos. Documents the chain anchor line schema convention — required key + chain features, optional chain_keys column (comma-joined member primary_keys in chain order) that unlocks the chain-coherent investigative loop on externally-curated chains. Walks through declaration in sphere.yaml and lists every standard anchor primitive and R9 family primitive that the convention column unlocks.
Sphere format stays at 2.4. Existing 2.4 spheres remain readable; new fields land after the next pattern rebuild where applicable.

See CHANGELOG.md for the full list of changes.

Assets 2

0.6.6

09 May 18:40

@karol-kedzia karol-kedzia

0.6.6

4387579

This commit was signed with the committer’s verified signature.

karol-kedzia Karol Kędzia

GPG key ID: 0A41132BA740580E

Verified

Learn about vigilant mode.

0.6.6

hypertopos-py 0.6.6 — calibration sensitivity surface

This release adds a calibration-quality diagnostic that lets investigators inspect threshold sensitivity at a glance — answering "is the chosen anomaly_percentile (typically p95) sitting in a smooth region of the population's delta_norm distribution, or perched near a heavy-tail jump where one percentile step would change the threshold by 50 % or more?". The diagnostic is computed at build time at zero new I/O cost (it glues onto the population sort the builder already runs for delta_rank_pct and conformal p-values), surfaces through a new navigator method and MCP tool plus a compact summary on every sphere_overview entry, and is documented in the gds-monitor skill's monitoring playbook.

The new field on CalibrationFit.theta_sensitivity carries a per-percentile sweep at p90 .. p99, reporting theta_mean (the threshold value at that percentile), anomaly_count_mean (the number of entities at or above it), and anomaly_rate per percentile. From this shape the navigator derives a stable band — the longest contiguous range of percentiles where adjacent-pair theta_mean ratio stays below 1.30, meaning recalibration anywhere in the band shifts the threshold by less than 30 % per step — and a cliff list, the percentile boundaries where the ratio is 1.50 or higher, signalling a heavy-tail region of the underlying distribution. Cliff and stable-band derivation use theta_mean ratios rather than anomaly_count_mean ratios, because count ratios at percentile boundaries are mechanically determined by (100-i) / (100-(i+1)) and identical across all distributions; theta ratios reflect the actual distributional shape and differentiate light-tail from heavy-tail patterns.

GDSNavigator.theta_sensitivity(pattern_id, version=None) returns a ThetaSensitivityReport with the per-percentile sweep plus the derived band and cliff list; the sphere_overview per-pattern entry gains an optional theta_sensitivity_summary block carrying stable_band_from, stable_band_to, stable_band_length, n_cliffs, and theta_at_p95. Calibration epochs from older builds that lack the underlying field continue to render sphere_overview entries as before — the block is silently skipped, with graceful degradation rather than a breaking change.

Highlights

New CalibrationFit.theta_sensitivity field — per-percentile sweep at p90 .. p99 populated automatically at build time on every pattern; reads back via read_calibration_fit. Calibration epochs from prior builds carry None and the field surfaces on the next pattern rebuild.
New GDSNavigator.theta_sensitivity(pattern_id, version=None) method returns a ThetaSensitivityReport with per-percentile sweep, derived stable_band (from, to, length), and cliffs[] (boundary pairs whose theta ratio is at or above 1.50). Companion to the existing compare_calibrations and decompose_drift calibration-history surfaces.
sphere_overview() per-pattern entry gains an optional theta_sensitivity_summary block (stable_band_from, stable_band_to, stable_band_length, n_cliffs, theta_at_p95) so an agent reading the population overview sees threshold-sensitivity triage without a per-pattern drilldown. Entries from older builds without the field render as before — block silently skipped.
Build-time cost: zero new I/O. The diagnostic glues onto the existing population sort already computed for delta_rank_pct and conformal p-values; per-pattern derivation is O(P) with P = 10 percentiles.
Sphere format stays at 2.4. Existing 2.4 spheres remain readable; the new field lands after the next pattern rebuild.

See CHANGELOG.md for the full list of changes.

Assets 2

0.6.5

08 May 14:59

@karol-kedzia karol-kedzia

0.6.5

dfe1aa4

This commit was signed with the committer’s verified signature.

karol-kedzia Karol Kędzia

GPG key ID: 0A41132BA740580E

Verified

Learn about vigilant mode.

0.6.5

hypertopos-py 0.6.5 — sharpen chain generation

This release sharpens chain anchor pattern generation along two axes that are independent of chain shape and complementary to the chain-coherent investigative loop shipped in 0.6.4. Two new derived columns surface textbook AML signals on every chain in the pattern, and a strict-prefix subsumption pass eliminates near-duplicate chain rows that previously occupied multiple slots in the points table for what was effectively one investigative finding.

The chain anchor pattern's per-chain feature surface gains cross_bank_count (the number of distinct banks the chain transits across all from_bank / to_bank pairs — a textbook jurisdictional layering indicator) and amount_monotone_decreasing (a boolean, true when amounts strictly decrease at every hop — a textbook structuring pattern). Both flow automatically into the chain anchor pattern's polygon delta and surface in find_anomalies over the chain pattern, in the chain-coherent loop tools, and in classify_chain_typology dominant_top_dim. The features are populated when the source event line declares from_bank / to_bank columns (or source_bank / destination_bank); chains built without bank columns carry cross_bank_count = 0 for backward compatibility. Effect is gated on next chain pattern rebuild — existing spheres carry the prior feature set until rebuild.

extract_chains post-merge dedup gains a strict ordered prefix subsumption pass: chains whose entity sequence is a strict prefix of another chain's entity sequence are dropped, since the longer chain investigates every entity the shorter one does plus more. The shorter prefix chain becomes investigatively redundant. Effect is gated on next chain pattern rebuild — existing spheres carry the prior chain count until rebuild. Chain-coherent loop tools see slightly fewer redundant chain entries on rebuilt spheres.

Highlights

New cross_bank_count per-chain derived column on chain anchor patterns — counts distinct banks across the chain's from_bank / to_bank pairs. Auto-populated when the event line declares from_bank / to_bank (or source_bank / destination_bank); chains without bank columns carry 0.
New amount_monotone_decreasing per-chain derived column — boolean, true when amounts strictly decrease at every hop. Gated on not is_cyclic so cyclic chains keep false regardless of amount sequence.
Both new chain features surface automatically in find_anomalies(<chain_pattern>), in the chain-coherent investigative loop tools (find_chains_with_coherent_anomaly, anomaly_propagation_in_chain, classify_chain_typology, extend_chain), and through the chain anchor pattern's polygon delta as a possible dominant_top_dim.
extract_chains post-merge dedup adds a strict-prefix subsumption pass — chains whose entity sequence is a strict ordered prefix of another chain's entity sequence are dropped. Reduces near-duplicate rows in the chain pattern points table.
Sphere format stays at 2.4. Existing 2.4 spheres remain readable; the new chain features land after a chain pattern rebuild, the prefix-subsumption pass takes effect on the next rebuild.

See CHANGELOG.md for the full list of changes.

Assets 2

0.6.4

07 May 18:46

@karol-kedzia karol-kedzia

0.6.4

119356a

This commit was signed with the committer’s verified signature.

karol-kedzia Karol Kędzia

GPG key ID: 0A41132BA740580E

Verified

Learn about vigilant mode.

0.6.4

hypertopos-py 0.6.4 — chain-coherent investigative loop

This release closes the chain-coherent family with four new navigator primitives that compose into a complete investigative workflow over chain anchor patterns: detect, trace, label, extend. The release also includes a critical fix to parallel chain extraction that produced colliding chain_id values across worker processes, plus a routing optimisation in find_motif_by_hops that removes a per-call edge-table read path which was structurally slower than the cached global adjacency at every measured seed count.

The investigative loop runs as: find_chains_with_coherent_anomaly sweeps the population and flags chains where consecutive entity-anchor positions are individually anomalous on the same dominant delta dimension; anomaly_propagation_in_chain drills into a single flagged chain and returns its hop-by-hop anomaly trace; classify_chain_typology labels the chain along five operational axes (run shape, peak position, position-in-chain, extension signals, dominant top dim) so investigators can triage without re-reading hops; extend_chain follows the boundary entity into other chains in the same pattern and returns ranked candidate extension entities ranked by their own anchor anomaly status. Together these distinguish "chain composition" anomalies (coherent cascade through structurally-similar entities) from "chain shape" anomalies (the chain itself unusual on hop_count / amount_decay / time_span) — the two are orthogonal detectors.

Highlights

New find_chains_with_coherent_anomaly(pattern_id, *, anchor_pattern_id, min_hops=3, max_results=100) — population sweep that surfaces chains where >=min_hops strictly consecutive entity-anchor positions are individually flagged is_anomaly=True AND share the same dominant delta dimension. Returns ranked runs with chain_id, run_start_idx, run_length, top_dim, run_keys, max_delta_norm, plus diagnostics. Pure query-side, additive within sphere format 2.4.
New anomaly_propagation_in_chain(chain_id, pattern_id, *, anchor_pattern_id) — per-chain inspector returning the hop-by-hop trace with is_anomaly, delta_norm, dominant top_dim, and delta_rank_pct per hop, plus summary stats (n_hops, n_anomalous, max_run_length_same_top_dim, dominant_top_dim).
New classify_chain_typology(chain_id, pattern_id, *, anchor_pattern_id) — five-axis operational tag: shape (monotone-rising / monotone-falling / peak-in-middle / peak-at-start / peak-at-end / flat / single-hop / no-anomalous-run), peak_position, position_in_chain (leading / transit / terminal / full-chain), extension_signals (forward / backward booleans), pre_run_rank_bucket, breakpoint_rank_bucket, and the chain-wide dominant_top_dim.
New extend_chain(chain_id, pattern_id, *, anchor_pattern_id, direction="forward"|"backward", max_results=20) — boundary-extension primitive that uses the chain reverse index to find entities that follow (or precede) the boundary key in OTHER chains in the same pattern, ranked by their own anchor anomaly status. Returns candidates with is_anomaly, delta_norm, delta_rank_pct, source chain ids, and a summary.
find_motif_by_hops gains an anomaly_seed_filter: bool = False parameter. When True, the BFS starting frontier is intersected with the anomaly subset of the resolved anchor companion pattern. Replaces the implicit "all keys" frontier when seed_keys=None, intersects with the explicit list otherwise. Result dict gains seed_filter_summary ({requested, anomaly, filtered}) so callers can verify the prune.
find_motif_by_hops no longer routes seeded queries through a per-call scoped edge-table read path. Scoped routing was uniformly slower than the cached global AdjacencyIndex at every measured seed count; the BFS enumerator already filtered the starting frontier by seed_keys, so the scoped branch added cost without benefit. After this change, all queries (seeded and unseeded) use the cached global adjacency.
Parallel chain extraction (extract_chains via ProcessPoolExecutor) emitted chain_id values that collided across workers — each worker built its chain list locally and used len(chains) as the id counter, so the merged output assigned the same CHAIN-XXXXXX value to up to N_workers distinct chains. Post-merge dedup now reassigns chain_id from the global merged index, restoring primary_key uniqueness in the chain pattern points table. Spheres built before this fix carry colliding chain_ids; rebuild the chain pattern (or the full sphere) to restore unique ids. A defensive raise in anomaly_propagation_in_chain (and inherited by the typology classifier and extension primitive) surfaces the ambiguity on legacy spheres so investigations don't silently return wrong-chain hops. A build-time safety net asserts post-dedup uniqueness so any future regression in the merge logic fails loudly.

See CHANGELOG.md for the full list of changes.

Assets 2

Releases: hypertopos/hypertopos-py

0.8.1

hypertopos 0.8.1

Highlights

Uh oh!

0.8.0

hypertopos-py 0.8.0 — incremental ingest, storage upgrade, and cloud-ops tooling

Highlights

Breaking changes

Uh oh!

0.7.3

hypertopos-py 0.7.3 — detector-composition continuation + investigation orchestrator surface

Highlights

Breaking changes

Infrastructure

Uh oh!

0.7.2

hypertopos-py 0.7.2 — composition orchestrators + signed-confidence ranking

Highlights

Builder + label-aware activation

Breaking changes

Infrastructure

Uh oh!

0.7.1

hypertopos-py 0.7.1 — label-aware calibration close-out + cycle hygiene

Highlights

Breaking changes

Dependency completeness fix

Uh oh!

0.7.0

hypertopos-py 0.7.0 — sphere-format-3.0 + investigation surface saturation

Highlights

Fixes and hardening

Commit-prefix note

Uh oh!

0.6.7

hypertopos-py 0.6.7 — close the investigation→SAR pipeline

Highlights

Uh oh!

0.6.6

hypertopos-py 0.6.6 — calibration sensitivity surface

Highlights

Uh oh!

0.6.5

hypertopos-py 0.6.5 — sharpen chain generation

Highlights

Uh oh!

0.6.4

hypertopos-py 0.6.4 — chain-coherent investigative loop

Highlights

Uh oh!