Introduce run.yml policy and run-scoped experiment logging for Grid benchmarks#625

Open

tlwillke wants to merge 6 commits intomain from

bench-logging

Open

Introduce run.yml policy and run-scoped experiment logging for Grid benchmarks #625
tlwillke wants to merge 6 commits intomain from
bench-logging

Conversation

@tlwillke

Copy link

Collaborator

@tlwillke tlwillke commented Feb 13, 2026

Summary

This PR introduces YAML-driven benchmark compute + reporting selection, and adds run-scoped logging artifacts (sys_info.json, dataset_info.csv, experiments.csv) so Grid benchmark runs are reproducible and easier to analyze offline. Reporting/benchmark policy is now run-level (via yaml-configs/run.yml), while dataset YAML files are simplified to dataset-tuned construction/search settings only.

Key changes

Run-level policy file: `yaml-configs/run.yml`

Added a new run-level config file that defines:
- benchmark computation policy (benchmarks: what to compute)
- console projection (console: subset + named metrics)
- experiments logging policy (logging: selection + sink type + run metadata)
Dataset YAMLs are now focused on dataset-tuned parameters only (construction/search grids). Reporting/benchmark controls are no longer expected in dataset YAMLs.

Config versioning clarity

Renamed the old vague YAML version field to onDiskIndexVersion (still validated against OnDiskGraphIndex.CURRENT_VERSION).
Introduced yamlSchemaVersion (starting at 1) to version YAML schema independently of on-disk index header format.
Added deprecated back-compat for legacy files that still use version: 6.

Compute vs display vs logging separation

Benchmarks specify what is computed; console/logging are projections only.
Added stable Metric.key identifiers for all benchmark outputs and telemetry so selection and CSV columns are robust (no header-string matching).
Enforced:
- strict subset validation for benchmark-stat selections (must be computed)
- best-effort warnings for runtime-unavailable telemetry/metrics (omitted from output/CSV, with warnings)

Run-scoped artifacts and logging

When logging is enabled, each BenchYAML invocation creates a run directory under:
- logging/<run_id>/
The run directory contains (see examples below):
- sys_info.json — host/OS/JVM/SIMD/threading/memory + stable system_id
- dataset_info.csv — one row per dataset used in the run
- experiments.csv — flat, append-only table of fixed params + selected output-key columns
experiments.csv schema stability:
- fixed columns are defined centrally in ExperimentsSchemaV1
- output-key columns are an ordered union across all topK scenarios encountered in the run
- non-applicable / missing values are written as empty CSV fields (easy for pandas/matplotlib)

Plumbing / ergonomics

Added RunArtifacts to centralize:
- run setup + validation
- dataset registration (dataset_info.csv)
- row logging (experiments.csv)
Updated BenchYAML and HelloVectorWorld to use run.yml + RunArtifacts (no manual wiring of compute/display/logging maps).
Refactored Grid signatures to take a single RunArtifacts object instead of many reporting parameters.
Legacy callers continue to work via the legacy Grid overload + RunArtifacts.disabled().

Legacy compatibility

Legacy YAML files without yamlSchemaVersion are treated as schema 0:
- parsed leniently (unknown fields ignored)
- emit a deprecation warning
- optional extraction of legacy search.benchmarks is supported for legacy behavior when run.yml is absent
AutoBenchYAML is not migrated to run.yml yet; it continues to work with legacy configs.

Behavior

Logging enabled (run.yml has logging.type):
- BenchYAML writes run artifacts under logging/<run_id>/
- prints a startup message pointing to the run directory and how to delete it to reclaim space
- appends experiment rows as configurations execute
Logging disabled (no logging section or blank type in run.yml):
- run artifacts are not created
- RunArtifacts becomes a no-op via RunArtifacts.disabled()
Console output continues to work via run-level selection; dataset configs tune only construction/search parameters.

Notes / limitations

experiments.csv uses Metric.key strings as column names for benchmark outputs/telemetry; missing values are empty CSV fields.
sys_info.json SIMD reports the configured vector width via io.github.jbellis.jvector.vector_bit_size (with simd_config_present), avoiding brittle runtime probing.
Threading in sys_info.json distinguishes build executor parallelism vs parallel streams (common FJP), explaining why build/query values can differ.
This PR intentionally does not overhaul per-index build time attribution (e.g., cache hits have no build time by design); deeper build-timing correctness is left for a future pass.
Dataset YAML breaking changes are intentional: version → onDiskIndexVersion, plus removal of dataset-level benchmarks/console/logging fields in favor of run.yml.

Example of run.yml and yaml-config for dataset

ada002-100k yml run yml

Examples of artifacts (sys_info, dataset_info, experiments)

experiments csv dataset_info csv sys_info json

tlwillke and others added 6 commits

February 7, 2026 16:30

@tlwillke


 Initial commit. Created ability to specify console output. Determinis...

07659f6

...tic ordering of benchmarks.

@tlwillke


 Decouples benchmark computation from console output by introducing st...

ca93c98

...able stat and metric keys and a resolver + catalog.

@tlwillke


 Added logging selection and selection validation.

fecc385

@tlwillke


 Introduces run-scoped experiment logging. Benchmark computation, cons...

22e7a4e

...ole selection, and logging selection are now run-level policies. Produces system configuration and dataset artifact as well.

@tlwillke


 Added legacy yaml configuration handling and yamlSchemaVersion enforc...

c14dbfe

...ement.

@tlwillke


 Merge latest main into branch.

387dc2f

@tlwillke tlwillke self-assigned this

Feb 13, 2026

@tlwillke tlwillke requested a review from MarkWolters as a code owner

February 13, 2026 02:26

@tlwillke tlwillke added the enhancement label

Feb 13, 2026

@tlwillke tlwillke requested a review from jshook as a code owner

February 13, 2026 02:26

@github-actions

Copy link

Contributor

github-actions bot commented Feb 13, 2026 •

edited by tlwillke

Loading

Before you submit for review:

Does your PR follow guidelines from CONTRIBUTIONS.md?
Did you summarize what this PR does clearly and concisely?
Did you include performance data for changes which may be performance impacting?
Did you include useful docs for any user-facing changes or features?
Did you include useful javadocs for developer oriented changes, explaining new concepts or key changes?
Did you trigger and review regression testing results against the base branch via Run Bench Main?
Did you adhere to the code formatting guidelines (TBD)
Did you group your changes for easy review, providing meaningful descriptions for each commit?
Did you ensure that all files contain the correct copyright header?

If you did not complete any of these, then please explain below.

@tlwillke tlwillke requested a review from ashkrisk

February 13, 2026 02:27

ashkrisk

ashkrisk reviewed

Feb 13, 2026

View reviewed changes

Copy link

Contributor

@ashkrisk ashkrisk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just started looking through it

jvector-examples/src/main/java/io/github/jbellis/jvector/example/util/CsvLogger.java

@@ -0,0 +1,4 @@

package io.github.jbellis.jvector.example.util;

public class CsvLogger {

Copy link

Contributor

@ashkrisk ashkrisk Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused?

Copy link

Collaborator Author

@tlwillke tlwillke Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so. I was experimenting with logging early last year. This looks to be leftover. Not part of this PR.

...mples/src/main/java/io/github/jbellis/jvector/example/benchmarks/ExecutionTimeBenchmark.java

double avgRuntimeSec = totalRuntime / queryRuns / 1e9;

return List.of(Metric.of("Avg Runtime (s)", format, avgRuntimeSec));

return List.of(Metric.of("search.execution_time.avg_runtime_s",

Copy link

Contributor

@ashkrisk ashkrisk Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this class is not wired up. Should we remove the file altogether? Execution time is (1 / QPS) anyway.

Copy link

Collaborator Author

@tlwillke tlwillke Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it has been unused but is unrelated / orthogonal to this PR. Please open an issue.

...examples/src/main/java/io/github/jbellis/jvector/example/benchmarks/ThroughputBenchmark.java

formatAvgQps,

stdDevQps));

list.add(Metric.of("search.throughput.cv_pct",

Copy link

Contributor

@ashkrisk ashkrisk Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change this to cv_pc_qps to make it a little more obvious what this is?

jvector-examples/src/main/java/io/github/jbellis/jvector/example/yaml/MultiConfig.java

java.util.regex.Pattern.compile("(?m)^\\s*yamlSchemaVersion\\s*:");

private int yamlSchemaVersion;

private int onDiskIndexVersion;

Copy link

Contributor

@ashkrisk ashkrisk Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the field onDiskIndexVersion? It has a getter and setter but these aren't referenced anywhere. It doesn't seem to affect any functionality either, including logging / reporting.

Similarly for the eponymous field in RunConfig.

Copy link

Collaborator Author

@tlwillke tlwillke Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I should have wired in the check. The legacy version does indeed check the on-disk index version against the yaml config to ensure that the parameters are revisited when the index is revised.

Labels

enhancement

2 participants

@tlwillke @ashkrisk

Conversation

@tlwillke tlwillke commented Feb 13, 2026

Summary

Key changes

Run-level policy file: yaml-configs/run.yml

Config versioning clarity

Compute vs display vs logging separation

Run-scoped artifacts and logging

Plumbing / ergonomics

Legacy compatibility

Behavior

Notes / limitations

Example of run.yml and yaml-config for dataset

Examples of artifacts (sys_info, dataset_info, experiments)

Uh oh!

github-actions bot commented Feb 13, 2026 • edited by tlwillke Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

@ashkrisk ashkrisk left a comment

Choose a reason for hiding this comment

Uh oh!

@ashkrisk ashkrisk Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

@tlwillke tlwillke Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

@ashkrisk ashkrisk Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

@tlwillke tlwillke Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

@ashkrisk ashkrisk Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

@ashkrisk ashkrisk Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

@tlwillke tlwillke Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Run-level policy file: `yaml-configs/run.yml`

github-actions bot commented Feb 13, 2026 •

edited by tlwillke

Loading