Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Introduce run.yml policy and run-scoped experiment logging for Grid benchmarks#625

Open
tlwillke wants to merge 6 commits intomain from
bench-logging
Open

Introduce run.yml policy and run-scoped experiment logging for Grid benchmarks #625
tlwillke wants to merge 6 commits intomain from
bench-logging

Conversation

@tlwillke
Copy link
Collaborator

@tlwillke tlwillke commented Feb 13, 2026

Summary

This PR introduces YAML-driven benchmark compute + reporting selection, and adds run-scoped logging artifacts (sys_info.json, dataset_info.csv, experiments.csv) so Grid benchmark runs are reproducible and easier to analyze offline. Reporting/benchmark policy is now run-level (via yaml-configs/run.yml), while dataset YAML files are simplified to dataset-tuned construction/search settings only.


Key changes

Run-level policy file: yaml-configs/run.yml

  • Added a new run-level config file that defines:
    • benchmark computation policy (benchmarks: what to compute)
    • console projection (console: subset + named metrics)
    • experiments logging policy (logging: selection + sink type + run metadata)
  • Dataset YAMLs are now focused on dataset-tuned parameters only (construction/search grids). Reporting/benchmark controls are no longer expected in dataset YAMLs.

Config versioning clarity

  • Renamed the old vague YAML version field to onDiskIndexVersion (still validated against OnDiskGraphIndex.CURRENT_VERSION).
  • Introduced yamlSchemaVersion (starting at 1) to version YAML schema independently of on-disk index header format.
  • Added deprecated back-compat for legacy files that still use version: 6.

Compute vs display vs logging separation

  • Benchmarks specify what is computed; console/logging are projections only.
  • Added stable Metric.key identifiers for all benchmark outputs and telemetry so selection and CSV columns are robust (no header-string matching).
  • Enforced:
    • strict subset validation for benchmark-stat selections (must be computed)
    • best-effort warnings for runtime-unavailable telemetry/metrics (omitted from output/CSV, with warnings)

Run-scoped artifacts and logging

  • When logging is enabled, each BenchYAML invocation creates a run directory under:
    • logging/<run_id>/
  • The run directory contains (see examples below):
    • sys_info.json — host/OS/JVM/SIMD/threading/memory + stable system_id
    • dataset_info.csv — one row per dataset used in the run
    • experiments.csv — flat, append-only table of fixed params + selected output-key columns
  • experiments.csv schema stability:
    • fixed columns are defined centrally in ExperimentsSchemaV1
    • output-key columns are an ordered union across all topK scenarios encountered in the run
    • non-applicable / missing values are written as empty CSV fields (easy for pandas/matplotlib)

Plumbing / ergonomics

  • Added RunArtifacts to centralize:
    • run setup + validation
    • dataset registration (dataset_info.csv)
    • row logging (experiments.csv)
  • Updated BenchYAML and HelloVectorWorld to use run.yml + RunArtifacts (no manual wiring of compute/display/logging maps).
  • Refactored Grid signatures to take a single RunArtifacts object instead of many reporting parameters.
  • Legacy callers continue to work via the legacy Grid overload + RunArtifacts.disabled().

Legacy compatibility

  • Legacy YAML files without yamlSchemaVersion are treated as schema 0:
    • parsed leniently (unknown fields ignored)
    • emit a deprecation warning
    • optional extraction of legacy search.benchmarks is supported for legacy behavior when run.yml is absent
  • AutoBenchYAML is not migrated to run.yml yet; it continues to work with legacy configs.

Behavior

  • Logging enabled (run.yml has logging.type):
    • BenchYAML writes run artifacts under logging/<run_id>/
    • prints a startup message pointing to the run directory and how to delete it to reclaim space
    • appends experiment rows as configurations execute
  • Logging disabled (no logging section or blank type in run.yml):
    • run artifacts are not created
    • RunArtifacts becomes a no-op via RunArtifacts.disabled()
  • Console output continues to work via run-level selection; dataset configs tune only construction/search parameters.

Notes / limitations

  • experiments.csv uses Metric.key strings as column names for benchmark outputs/telemetry; missing values are empty CSV fields.
  • sys_info.json SIMD reports the configured vector width via io.github.jbellis.jvector.vector_bit_size (with simd_config_present), avoiding brittle runtime probing.
  • Threading in sys_info.json distinguishes build executor parallelism vs parallel streams (common FJP), explaining why build/query values can differ.
  • This PR intentionally does not overhaul per-index build time attribution (e.g., cache hits have no build time by design); deeper build-timing correctness is left for a future pass.
  • Dataset YAML breaking changes are intentional: versiononDiskIndexVersion, plus removal of dataset-level benchmarks/console/logging fields in favor of run.yml.

Example of run.yml and yaml-config for dataset

ada002-100k yml run yml

Examples of artifacts (sys_info, dataset_info, experiments)

experiments csv dataset_info csv sys_info json

@tlwillke tlwillke self-assigned this Feb 13, 2026
@tlwillke tlwillke added the enhancement New feature or request label Feb 13, 2026
Copy link
Contributor

github-actions bot commented Feb 13, 2026
edited by tlwillke
Loading

Before you submit for review:

  • Does your PR follow guidelines from CONTRIBUTIONS.md?
  • Did you summarize what this PR does clearly and concisely?
  • Did you include performance data for changes which may be performance impacting?
  • Did you include useful docs for any user-facing changes or features?
  • Did you include useful javadocs for developer oriented changes, explaining new concepts or key changes?
  • Did you trigger and review regression testing results against the base branch via Run Bench Main?
  • Did you adhere to the code formatting guidelines (TBD)
  • Did you group your changes for easy review, providing meaningful descriptions for each commit?
  • Did you ensure that all files contain the correct copyright header?

If you did not complete any of these, then please explain below.

Copy link
Contributor

@ashkrisk ashkrisk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just started looking through it

@@ -0,0 +1,4 @@
package io.github.jbellis.jvector.example.util;

public class CsvLogger {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so. I was experimenting with logging early last year. This looks to be leftover. Not part of this PR.


double avgRuntimeSec = totalRuntime / queryRuns / 1e9;
return List.of(Metric.of("Avg Runtime (s)", format, avgRuntimeSec));
return List.of(Metric.of("search.execution_time.avg_runtime_s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this class is not wired up. Should we remove the file altogether? Execution time is (1 / QPS) anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it has been unused but is unrelated / orthogonal to this PR. Please open an issue.

formatAvgQps,
stdDevQps));

list.add(Metric.of("search.throughput.cv_pct",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change this to cv_pc_qps to make it a little more obvious what this is?

java.util.regex.Pattern.compile("(?m)^\\s*yamlSchemaVersion\\s*:");

private int yamlSchemaVersion;
private int onDiskIndexVersion;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the field onDiskIndexVersion? It has a getter and setter but these aren't referenced anywhere. It doesn't seem to affect any functionality either, including logging / reporting.

Similarly for the eponymous field in RunConfig.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I should have wired in the check. The legacy version does indeed check the on-disk index version against the yaml config to ensure that the parameters are revisited when the index is revised.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@ashkrisk ashkrisk ashkrisk left review comments

@MarkWolters MarkWolters Awaiting requested review from MarkWolters MarkWolters is a code owner

@jshook jshook Awaiting requested review from jshook jshook is a code owner

At least 2 approving reviews are required to merge this pull request.

Labels

enhancement New feature or request

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /