Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

tsouza/cerberus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

913 Commits

cerberus

Drop-in Prometheus / Loki / Tempo HTTP gateway for ClickHouse. Keep Grafana, alerting, and your CLI tooling. Swap the backend.

Warning

EXPERIMENTAL — NOT PRODUCTION-READY. Cerberus is at v1.0.0-rc.1, early and under active development. The differential harnesses run on every PR and score parity against real Prometheus / Loki / Tempo, but correctness, performance, and operational behaviour are still being shaken out, and the surface is evolving. Validate it against your own corpus before pointing anything real at it — do not stand it in for a running Prom / Loki / Tempo deployment without that evaluation — and expect breaking changes. See CHANGELOG.md for what has landed so far.

CI Mutation License: Apache 2.0 Go Reference Go Report Card PromQL compat LogQL compat TraceQL compat

The three *QL compat badges are differential parity scorespassed / total cases where cerberus matched a reference Prometheus / Loki / Tempo on the same seeded corpus (details). The PromQL leg runs the third-party PromLabs / CNCF PromQL Compliance Tester (prometheus/compliance) — the same tool the CNCF Prometheus Conformance Program uses — at 574/574 cases passing, no allow-list, against a real prom/prometheus. The scores are tracked, not gated: see Compatibility for exactly what the CI checks enforce.


Why cerberus?

Metrics, logs, and traces rarely share a store — the usual answer is Prometheus + Loki + Tempo, three retention policies and storage bills for what is largely the same OTLP data sliced three ways. ClickHouse is a great single store for all three signals; cerberus supplies the missing query side. Point Grafana at it as three datasources and your existing PromQL / LogQL / TraceQL keeps working, translated to ClickHouse SQL underneath.

  • No Grafana plugin. Cerberus speaks each upstream HTTP API verbatim (/api/v1/query_range, /loki/api/v1/query_range, /api/search, ...). Grafana sees three normal datasources.
  • No custom QL. PromQL, LogQL, TraceQL — exactly as your dashboards and alerts already use them.
  • No reinvented parsers. Cerberus imports prometheus/promql/parser, grafana/loki/v3/pkg/logql/syntax, and grafana/tempo/pkg/traceql directly. If upstream parses it, cerberus parses it.

Version requirements

Two axes decide whether a deployment is compatible: the ClickHouse server version cerberus queries, and the OTel schema shape the data was written in.

Component Minimum Notes
ClickHouse 24.8 The supported floor — the SQL cerberus emits is correct down to it. Enabling the experimental native rate requires 25.6 (below).
OTel exporter schema clickhouseexporter 0.152.0 A schema shape, not a binary version — see below.

ClickHouse. 24.8 is the lowest version cerberus's emitted SQL is correct on: the 24.8 empty-input / parse-unit / filter-path quirks are all worked around unconditionally, so a query that runs on 24.8 runs on every newer server too. The differential compatibility harnesses — the source of truth for all three heads — execute on ClickHouse 25.8, so the validated SQL is exercised forward of the floor as well. Enabling the experimental native-rate path (CERBERUS_EXPERIMENTAL_TS_GRID_RANGE, default off) raises the floor to 25.6: it lowers eligible rate(<counter>[range]) range queries to the compiled timeSeriesRateToGrid aggregate, which exists only from ClickHouse 25.6. With the flag off, 24.8 is sufficient. See docs/operations.md for the runtime contract and the experimental-setting details.

OTel schema — the shape, not the exporter. Cerberus reads the OpenTelemetry ClickHouse schema shape pinned to clickhouseexporter v0.152.0 (via the tsouza/...:cerberus-ddl fork in go.mod). What matters is the table layout — column names, types, and Map shapes — not which binary produced it. Any exporter, collector pipeline, or ingestion path that writes tables in that shape works; the exporter binary version itself is irrelevant. If your layout deviates from the exporter defaults, point cerberus at it with the CERBERUS_SCHEMA_* overrides — see docs/configuration.md.

Quick start

git clone https://github.com/tsouza/cerberus.git && cd cerberus
docker compose up --wait
open http://localhost:3000 # Grafana (auto-login as admin); cerberus on :8080

That builds cerberus, boots single-node ClickHouse, loads a deterministic OTel fixture (logs / traces / metrics), and brings up Grafana pre-provisioned with cerberus as three datasources. A fresh dashboard populates in ~30s; docker compose down -v wipes the volume.

From a published release

Cerberus is one stateless binary configured via environment variables. Pin an explicit tag — :latest only moves with stable releases:

docker pull ghcr.io/tsouza/cerberus:<tag>
docker run --rm -p 8080:8080 -e CERBERUS_CH_ADDR=clickhouse:9000 \
 ghcr.io/tsouza/cerberus:<tag>

Prebuilt binaries (linux / darwin ×ばつ amd64 / arm64) are on the release page; each release ships a SLSA build provenance attestation:

gh attestation verify cerberus_*_linux_amd64.tar.gz --owner tsouza --repo cerberus

Cerberus is configured entirely through CERBERUS_* environment variables — see the full configuration reference. The surrounding runtime contract (lifecycle, scaling, the solver and experimental knobs in context) lives in docs/operations.md.

Architecture

Cerberus has one query pipeline, not three. Each head parses with its reference upstream parser and lowers to a shared plan IR (internal/chplan); a rule-based optimiser rewrites it; the closed typed-Frag internal/chsql emitter produces parameterised, escape-free ClickHouse SQL; and the engine streams results. The three HTTP heads plug in as thin Lang adapters over internal/engine, so the optimiser and emitter never know which head produced a plan — new optimisations cost one implementation, not three.

See docs/engine.md for the Lang contract, the request lifecycle, and the per-stage breakdown (IR algebra, optimiser rules, the typed-SQL emitter, the OTel schema). For how cerberus keeps queries fast — the compute-fan-out strategy and per-layer optimisations — see docs/performance.md.

Rate-over-range is exact by default. rate(...) range queries match reference Prometheus bit-for-bit and stay sub-second at realistic scale. For million-row queries an experimental native ClickHouse path (timeSeriesRateToGrid) trades a sub-observable last-bit rounding difference for flat memory and an order-of-magnitude speed-up — see the exactness-vs-scale tradeoff guide.

Compatibility

Each query language has a differential harness: cerberus and a reference engine answer the same corpus against the same seeded data, and the responses are diffed case-for-case — pinning observed semantics on real ClickHouse against an upstream oracle, not just emitted SQL.

The strongest leg is PromQL, which runs the third-party PromQL Compliance Tester (prometheus/compliance, the PromLabs / CNCF Prometheus Conformance Program tooling) against a real prom/prometheus, seeded identically on both sides via remote-write. 574/574 cases pass, no allow-list. LogQL diffs against a real Loki on Grafana's own pkg/logql/bench corpus — solid, but a Grafana bench corpus rather than a standardised conformance suite. TraceQL is the lighter leg: there is no third-party TraceQL conformance suite, so its corpus is cerberus-owned (author-written TXTAR), and its numerical confidence is correspondingly lower than PromQL's.

Head Reference + corpus Required check Conformance leg
PromQL real prom/prometheus vs prometheus/compliance (PromLabs / CNCF) compatibility/prometheus third-party conformance suite (strongest)
LogQL real Loki vs grafana/loki:pkg/logql/bench corpus compatibility/loki real-backend diff, Grafana bench corpus
TraceQL real Tempo vs cerberus-owned TXTAR corpus compatibility/tempo author-written corpus (lightest)
just compat-all # or compat-promql / compat-logql / compat-traceql

What the required checks enforce. The three compatibility/<head> checks run on every PR and fail on infrastructure breakage (stack won't boot, seed fails, report unparseable). Per-case parity drift is report-only by design (#503): it is recorded in report.json and rendered into the live compat-score.json badge, but does not turn the required check red. The one lane that hard-fails on any parity diff is compatibility/prometheus-forced-route (FAIL_ON_DIFF=1, proving the sharded solver route is byte-identical to reference Prometheus over the whole corpus) — that lane is informational, not a required check. The honest reading: the badges are a continuously re-measured conformance score, not a merge gate on numeric correctness.

No allow-lists — every diff against the reference is a real bug to fix at the source, not an exception to suppress. The full playbook (per-head drivers, local reproduction, rejection parity, the sole pinned upstream-skip-baseline contract) is in docs/compatibility.md.

Testing

Cerberus is tested in a 13-layer map spanning AST-shape pinning, plan-IR invariants, optimiser properties, emitted-SQL goldens, chDB roundtrips, function-surface parity, HTTP wire conformance, differential harnesses, Playwright UX flows, deterministic chaos / goleak, perf benchmarks + compute-fan-out guards, live-stack chaos against the k3d deployment, and an oracle-based property framework. just test runs the core lanes; see docs/test-strategy.md for the canonical layer map, the CI-gate inventory, and the gremlins rollout.

Documentation

Doc What's in it
docs/engine.md The shared query pipeline, the Lang contract, and the per-stage breakdown.
docs/coverage.md Per-function / per-construct support status across PromQL / LogQL / TraceQL.
docs/configuration.md The full CERBERUS_* environment-variable reference, grouped by area, with types and defaults.
docs/operations.md Runtime contract: lifecycle, scaling, the solver and experimental knobs in context.
docs/performance.md The compute-fan-out strategy, per-layer optimisations, and how they're held against regression.
docs/solver.md The sharded-pushdown solver: eligibility, slicing, execution, and the cancellation contract.
docs/benchmarks.md Benchmark methodology and the recorded numbers (regenerable).
docs/compatibility.md The differential-harness playbook for all three heads.
docs/test-strategy.md The 13-layer test map and CI-gate inventory.
docs/observability.md Self-observability across logs / metrics / traces (OTLP export).
docs/health.md /readyz / /healthz probe semantics.
docs/upstream-forks.md The tsouza/* parser-fork + Dependabot-watch flow.
docs/forbid-skip.md The forbidden-pattern reference for the forbid-skip gate.

Contributing

Smaller PRs (a new optimizer rule, a TXTAR fixture, a parser-dep bump) are welcome any time; open an issue or discussion before a large one. The local-dev and end-to-end commands live in CONTRIBUTING.md.

License

Apache 2.0 © Thiago Souza.

About

Drop-in Prometheus / Loki / Tempo HTTP gateway for ClickHouse. Translate PromQL, LogQL, and TraceQL into optimized CH SQL — keep Grafana, swap the backend.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

Contributors

Languages

AltStyle によって変換されたページ (->オリジナル) /