Multi-layer audit framework for agent benchmark integrity
-
Updated
Jun 8, 2026 - Python
Multi-layer audit framework for agent benchmark integrity
TelecomAudit (CIKM 2026): origin-aware benchmark auditing & calibration for 5G/Open RAN anomaly detection — exposes the synthetic→controlled-real transfer gap and repairs it with a small calibration budget.
Failure-aware audit protocol for reasoning-trajectory analysis in HumanEval code generation.
Add a description, image, and links to the benchmark-auditing topic page so that developers can more easily learn about it.
To associate your repository with the benchmark-auditing topic, visit your repo's landing page and select "manage topics."