Name	Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows	.github/workflows
apps	apps
docs	docs
experiments	experiments
infra	infra
load-tests/k6	load-tests/k6
packages	packages
results	results
scripts	scripts
tests	tests
.dockerignore	.dockerignore
.env.example	.env.example
.gitattributes	.gitattributes
.gitignore	.gitignore
.prettierignore	.prettierignore
.prettierrc	.prettierrc
.zenodo.json	.zenodo.json
CITATION.cff	CITATION.cff
LICENSE	LICENSE
README.md	README.md
eslint.config.mjs	eslint.config.mjs
observable_microservice_lab_roadmap.md	observable_microservice_lab_roadmap.md
package.json	package.json
pnpm-lock.yaml	pnpm-lock.yaml
pnpm-workspace.yaml	pnpm-workspace.yaml
tsconfig.base.json	tsconfig.base.json
vitest.config.ts	vitest.config.ts
vitest.e2e.config.ts	vitest.e2e.config.ts

🔭 TraceForge

Observable Microservice Lab

An experiment-first platform that measures the real cost and debugging value of observability in a containerized microservice system.

Not another microservices demo — a controlled laboratory that produces repeatable measurements, comparison tables, and charts.

TypeScript NestJS Node pnpm Docker

PostgreSQL MongoDB Redis RabbitMQ OpenTelemetry Prometheus Grafana k6

Tests Strict v1 report License reproducible DOI

✨ Why TraceForge?

Most "microservice projects" prove that an app works. TraceForge proves that a system can be deployed, observed, measured, stressed, broken on purpose, and explained with evidence.

It answers one core question with numbers, not opinions:

How much performance and resource overhead does observability add — and how much does it actually improve debugging and failure diagnosis?

🎚️ One switch, five depths	A single `OBS_MODE` variable flips the whole stack between no observability and a full OpenTelemetry pipeline.
📏 Measurement from day one	Every mode runs the same k6 load, captures Docker stats + telemetry volume, and computes overhead vs a baseline.
💥 Break it on purpose	Six injectable faults (slow payment, errors, slow DB, dead consumer, Redis down, memory pressure) for debuggability studies.
📊 Real artifacts	Repeatable CSVs, dependency-free SVG charts, and analysis reports — not screenshots.
🧱 Clean architecture	pnpm monorepo, ports-and-adapters services, shared typed packages, strict TypeScript, CI.

🔬 Research Scope

This repository is a controlled experimental artifact. It does not claim general superiority of any observability stack, database, or orchestration platform. All conclusions are limited to the implemented workload, dataset, hardware, and experimental protocol — they characterize this system, this machine, and this telemetry implementation, and should not be generalized to all microservice systems. Stating these bounds is scientific honesty, not a limitation of the method.

Scope notes:

RQ2 measures failure detection (MTTD), not full debuggability. Root-cause diagnosis requires a controlled operator study and is explicitly future work.
Single machine / single stack (Apple M4, 16 GiB, Node.js/TypeScript). A different runtime, instrumentation library, or hardware budget could shift both the magnitude and the ordering of the costs.
The numbers characterize the artifact; the methodology and qualitative ordering are the transferable contributions.

For reviewers / supervisors: docs/manuscript.md (paper draft) · docs/claims-to-evidence.md (every claim → command → data) · docs/demo.md (5-minute live walkthrough) · docs/ru/README.md (полная документация на русском).

🏗️ Architecture

flowchart LR
 k6([🧪 k6 / Client]) --> GW[API Gateway]
 GW --> TX[Transaction Service]
 TX --> PG[(🐘 PostgreSQL)]
 TX --> RD[(⚡ Redis)]
 TX --> PAY[Payment Service]
 TX --> MQ{{🐇 RabbitMQ}}
 MQ --> WK[Worker Service]
 WK --> PG
 subgraph OBS [🔭 Observability]
 OT[OTel Collector] --> PR[Prometheus]
 OT --> LO[Loki]
 OT --> JA[Jaeger]
 PR --> GR[📊 Grafana]
 LO --> GR
 JA --> GR
 end
 GW -. OTLP .-> OT
 TX -. OTLP .-> OT
 PAY -. OTLP .-> OT
 WK -. OTLP .-> OT

The request path: POST /transactions → API Gateway → Transaction Service → PostgreSQL write → Redis cache → Payment Service → RabbitMQ publish → Worker consume → event persisted. One flow that crosses HTTP, SQL, cache, and async messaging — enough surface area to measure something real.

Service	Package	Port	Role
🚪 API Gateway	`@traceforge/api-gateway`	`3000`	Public API, correlation/trace propagation
💳 Transaction Service	`@traceforge/transaction-service`	`3001`	Core flow, Postgres + Redis + RabbitMQ
🏦 Payment Service	`@traceforge/payment-service`	`3002`	Simulated payment + fault injection
⚙️ Worker Service	`@traceforge/worker-service`	`3003`	Consumes events, persists them

🔭 Observability Modes

Flip the entire telemetry depth with one environment variable. Every layer cleanly degrades to a no-op when disabled.

`OBS_MODE`	Metrics	Logs	Traces	Collector	Purpose
`none`	⬜	⬜	⬜	⬜	Raw baseline
`metrics`	✅	⬜	⬜	⬜	Prometheus only
`metrics_logs`	✅	✅	⬜	⬜	+ structured logs & correlation IDs (Loki)
`metrics_logs_traces`	✅	✅	✅	⬜	+ distributed tracing (Jaeger)
`otel_full`	✅	✅	✅	✅	Everything routed through the OTel Collector

🧪 What It Measures — Sample Findings

The realistic load campaign (pnpm load:run, open-model, N=10 per mode) feeds the statistics pipeline (STATS_DATASET=load pnpm stats:report) for non-parametric analysis with bootstrap CIs. The headline RQ1 result:

Mode	Median CPU % [95% CI]	CPU overhead	Median p50 (ms)	Differs from baseline?
🟢 Baseline	5.6 [5.3, 7.2]	—	1.72	—
📈 Metrics	8.3 [5.2, 13.6]	+48%	2.71	no (CIs overlap; p≈.3)
📝 Metrics + Logs	14.9 [13.7, 19.6]	+164%	4.77	yes (p<0.001, δ=1.0)
🔗 + Traces	8.9 [8.2, 13.2]	+59%	2.89	yes (p=0.003)
🛰️ Full OTel	8.5 [7.2, 9.6]	+51%	3.33	yes (p<0.001)

💡 Takeaway: metrics are essentially free (not statistically distinguishable from baseline), structured logging is the dominant cost (+164% CPU, +177% p50, with severe tail spikes), and the batched OpenTelemetry pipeline stays smooth despite carrying the most telemetry — all backed by N=10, bootstrap CIs, Kruskal–Wallis, Mann–Whitney U, and Cliff's δ.

📄 Read the full write-up: the journal manuscript draft is docs/manuscript.md; the engineering report is docs/final-report.md (§6.0 = primary result), with all tables and box plots in docs/statistics-load-report.md and the literature review in docs/related-work.md.

💥 Failure Injection & Debuggability

Six faults, all off by default, toggled by environment variables — paired with a manual debugging protocol that measures time-to-detect and time-to-root-cause across observability modes.

ID	Scenario	Inject with	Symptom	Best tool
F1	🐌 Slow payment	`PAYMENT_MODE=slow PAYMENT_DELAY_MS=1000`	high latency	traces
F2	🔴 Payment errors	`PAYMENT_ERROR_RATE=0.2`	error spike	metrics + logs
F3	🐢 Slow DB query	`DB_SLOW_QUERY=true`	p95 increase	traces + DB metrics
F4	🧊 Consumer stopped	`WORKER_DISABLED=true`	queue lag	metrics
F5	⚡ Redis unavailable	`REDIS_DISABLED=true`	cache miss + latency	logs + metrics
F6	🧠 Memory pressure	`MEMORY_PRESSURE_ENABLED=true MEMORY_PRESSURE_MB=256`	latency/error growth	metrics

➡️ Protocol & schema: docs/failure-injection-protocol.md · Generate the report: pnpm failure:report

Objective detection (pnpm mttd:run): rather than a subjective timing, MTTD is measured as the time from fault onset to Prometheus alert firing. A real result — the fault is severe but invisible without metrics:

Fault	Baseline (no metrics)	Metrics (alert)
🔴 Payment errors	12% errors, undetected	pending 9s · fire 70s
🐌 Slow payment	p95 ≈ 1007 ms, undetected	pending 10s · fire 72s

🔬 A step change, not a gradient: observability converts an undetectable fault into one detected within ~one scrape interval. See docs/mttd-report.md.

🗃️ Database Indexing Experiments

pnpm indexing:run seeds 1,000,000 transactions (research-grade) across 100k users and captures real EXPLAIN (ANALYZE, BUFFERS) plans across 7 index strategies ×ばつ 5 query patterns (35 combinations), with bootstrap 95% CIs on the p95 query time and read-improvement vs write-penalty reported separately. A real result from this repo:

Query	Best strategy	p95: no-index → indexed	Improvement
Q1 user history	`(user_id, ...)` composite	24.5 ms → 0.11 ms	+99.6%
Q4 user + status + time	`(user_id, status, ...)`	20.0 ms → 0.04 ms	+99.8%
Q2 `status='failed'` + time	`(status, created_at)`	25.4 ms → 13.5 ms	+47%
Q3 high-value (`amount` unidx)	none helps (seq scan)	full 1M-row scan	~0%

💡 Takeaway: indexes matching the query's leading columns turn 1M-row sequential scans into ~16-row index lookups, but every index adds +19% (partial) to +322% (3-column) write latency — the read-vs-write trade-off, measured at scale with bootstrap 95% CIs. Full tables, charts, and raw plans: docs/postgres-indexing-report.md.

The same experiment runs on MongoDB (pnpm indexing:mongo, 6 strategies ×ばつ 4 queries via explain("executionStats")), and pnpm sql-nosql:report produces a careful SQL-vs-NoSQL comparison that leads with the structural metric (rows/documents examined) — the apples-to-apples signal — with latency treated as indicative only (both engines at 1M rows/documents):

Query (structural)	PostgreSQL best	examined	MongoDB best	examined
Q1 user history	`I6` Bitmap Heap	16 rows	`M5` IXSCAN	11 docs
Q2 `status='failed'`	`I5` Bitmap Heap	27,740	`M4` IXSCAN	27,553
Q4 user+status+time	`I6` Index Scan	5 rows	`M5` IXSCAN	4 docs

🔬 Both engines reduce work along the same structural lines. Per the project's anti-goals, no "X is faster than Y" claim is made — conclusions are scoped to this dataset, access pattern, and hardware. See docs/mongodb-indexing-report.md and docs/sql-nosql-comparison.md.

🧭 Orchestration: Compose vs Swarm vs Kubernetes

pnpm orchestration:run deploys the same core stack on Docker Compose and Docker Swarm, measuring startup, scaling, recovery, and resource overhead live. The Kubernetes manifests are authored and validated (kubeconform, 19/19 resources) but not run here (no local cluster). A real result:

Target	Startup	Scale a service	Recover a killed instance	Config (core)
🐳 Compose	~34 s	❌ host-port conflict	❌ none (not a reconciler)	superset1
🐝 Swarm	~25 s	✅ routing mesh (1→3)	✅ auto-reschedule (~15 s)	123 lines
☸️ Kubernetes2	n/a	✅ HPA in manifest	✅ ReplicaSet controller	356 lines

💡 Takeaway: Compose is the simplest to start but is not an orchestrator — it can't scale a host-port-published service and won't restart a killed container. Swarm adds a small deploy block and gets the routing mesh + self-healing. Kubernetes offers the strongest primitives for the most configuration. Full tables and charts: docs/orchestration-comparison.md.

1 The Compose file includes the observability profiles (superset); the fair core-only comparison is Swarm (123) vs Kubernetes (356). 2 Kubernetes is authored + statically validated, not run in this environment.

🚀 Quick Start

# 1. Install & verify
pnpm install
pnpm test # 42 unit tests
pnpm typecheck
pnpm lint
pnpm build
# 2. Start the base stack (Postgres, Redis, RabbitMQ + services)
docker compose -f infra/docker/compose/docker-compose.base.yml up --build -d
pnpm migrate:postgres
pnpm seed:postgres
# 3. Create a transaction
curl -X POST http://localhost:3000/transactions \
 -H "content-type: application/json" \
 -d '{"userId":"user-1","amount":42,"currency":"USD","description":"Demo"}'

Run services locally in watch mode instead: pnpm dev

🔬 Running the Experiments

Each observability mode runs the same k6 scenario ×ばつ, samples Docker stats, captures telemetry volume, and writes raw + processed + report artifacts.

Per-mode experiment commands

# Phase 3 — Baseline (no observability)
OBS_MODE=none pnpm baseline:run
# Phase 4 — Metrics
OBS_MODE=metrics docker compose -f infra/docker/compose/docker-compose.base.yml --profile metrics up --build -d
pnpm migrate:postgres && pnpm metrics:run
# Phase 5 — Metrics + Logs
OBS_MODE=metrics_logs docker compose -f infra/docker/compose/docker-compose.base.yml --profile metrics --profile logs up --build -d
pnpm migrate:postgres && pnpm metrics-logs:run
# Phase 6 — Metrics + Logs + Traces
OBS_MODE=metrics_logs_traces docker compose -f infra/docker/compose/docker-compose.base.yml --profile metrics --profile logs --profile traces up --build -d
pnpm migrate:postgres && pnpm metrics-logs-traces:run
# Phase 7 — Full OpenTelemetry pipeline
OBS_MODE=otel_full OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318 \
 docker compose -f infra/docker/compose/docker-compose.base.yml --profile metrics --profile logs --profile traces --profile otel up --build -d
pnpm migrate:postgres && pnpm otel-full:run
# Phase 8 — Aggregate all modes into one comparison (no containers needed)
pnpm overhead:report

Load profiles & failure scenarios (k6)

# Load profiles — drive the same flow at different shapes
k6 run load-tests/k6/smoke.js
k6 run load-tests/k6/stress.js # 50→100→200→500 VUs
k6 run load-tests/k6/spike.js # 10→300→10 VUs
k6 run -e VUS=100 -e DURATION=60m load-tests/k6/soak.js
# Failure scenarios — start the stack with a fault flag, then drive load
k6 run load-tests/k6/failure-payment-slow.js
k6 run load-tests/k6/failure-payment-errors.js
k6 run load-tests/k6/failure-db-slow.js
k6 run load-tests/k6/failure-rabbitmq-consumer.js
k6 run load-tests/k6/failure-redis.js
pnpm failure:report # detection / root-cause report + charts

Dashboards when the stack is up: Grafana :3004 · Prometheus :9090 · Jaeger :16686 · RabbitMQ :15674

📊 Results & Artifacts

results/
├── raw/ # k6 summaries, Docker stats, telemetry-volume JSON per run
├── processed/ # comparison CSVs (observability-overhead, debuggability)
└── charts/ # dependency-free SVG charts (latency, CPU, memory, overhead, volumes)
docs/
├── observability-overhead-report.md # Phase 8 cross-mode analysis
├── failure-injection-report.md # Phase 9 debuggability report
└── failure-injection-protocol.md # manual measurement protocol

🧱 Shared Packages

Package	Purpose
`@traceforge/contracts`	Shared types, constants, and request validation
`@traceforge/config`	Typed env parsing — service, runtime, telemetry & fault config
`@traceforge/logger`	Structured JSON logs, correlation IDs, Loki/OTLP export
`@traceforge/metrics`	Prometheus registry + HTTP/DB/Redis/RabbitMQ instruments
`@traceforge/tracing`	OpenTelemetry SDK, W3C context propagation, span helpers

🗺️ Roadmap

v1.0 — Observability Laboratory (current) covers the full measurement story end-to-end:

✅	Phase
✅	0	Research design — questions, hypotheses, KPIs
✅	1	Monorepo & service foundation
✅	2	Core business flow (HTTP + SQL + cache + async)
✅	3	Baseline without observability
✅	4	Metrics
✅	5	Metrics + Logs
✅	6	Metrics + Logs + Traces
✅	7	Full OpenTelemetry pipeline
✅	8	Observability-overhead experiments & charts
✅	9	Failure injection & debuggability tooling
✅	13	Final report & paper draft — `final-report.md` · `paper-draft.md`

Post-v1 research (in progress):

✅	Phase
✅	10	PostgreSQL indexing experiments — `postgres-indexing-report.md`
✅	11	MongoDB / SQL-vs-NoSQL indexing — `mongodb-indexing-report.md` · `sql-nosql-comparison.md`
✅	12	Compose vs Swarm vs Kubernetes orchestration — `orchestration-comparison.md`

🔭 Optional extensions: running the Kubernetes manifests on a local cluster, and per-target k6 load tests.

🧰 Tech Stack

Backend NestJS · TypeScript (strict) · pnpm workspaces Data PostgreSQL · MongoDB · Redis · RabbitMQ Observability OpenTelemetry · Prometheus · Grafana · Loki · Jaeger Load & Orchestration k6 · Docker Compose

🔁 Reproducibility & Citation

Every figure, table, and dataset is regenerable from source. The exact environment (hardware, runtimes, image digests) is recorded by pnpm env:capture into results/environment.json, and the full reproduction guide — prerequisites, a command for every result, determinism/seeds, and a data-availability statement — is in docs/reproducibility.md.

To cite this work, use CITATION.cff (GitHub's "Cite this repository" button). A versioned, DOI-archived snapshot is on Zenodo: 10.5281/zenodo.20561281 .

📜 License

Code — MIT.
Experimental data & figures (results/ and the generated reports) — CC-BY-4.0.

Built as a measurement system from day one. 🔭

Folders and files

Latest commit

History

Repository files navigation

🔭 TraceForge

Observable Microservice Lab

✨ Why TraceForge?

🔬 Research Scope

🏗️ Architecture

🔭 Observability Modes

🧪 What It Measures — Sample Findings

💥 Failure Injection & Debuggability

🗃️ Database Indexing Experiments

🧭 Orchestration: Compose vs Swarm vs Kubernetes

🚀 Quick Start

🔬 Running the Experiments

📊 Results & Artifacts

🧱 Shared Packages

🗺️ Roadmap

🧰 Tech Stack

🔁 Reproducibility & Citation

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages