Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

confidence-scoring

Here are 30 public repositories matching this topic...

Verification system that catches coding agents falsely claiming task completion. Runs 4 parallel checks (file integrity, test quality, scope narrowing, optional LLM judge) over task+claim+diff and returns a weighted 0-100 confidence score with evidence.

  • Updated May 21, 2026
  • Python

System that aggregates outputs from multiple Large Language Models (GPT-4, Claude-3, custom models) to generate reliable, high-confidence results through consensus-based reasoning evaluation. Demonstrates sophisticated AI orchestration with 92.7% accuracy improvement over single-model.

  • Updated Dec 22, 2025
  • Python

AI-powered concierge that normalises guest messages from WhatsApp, Booking.com, Airbnb, Instagram and direct channels, drafts a reply with Claude, and routes responses through a deterministic confidence-scoring pipeline. Built with FastAPI + Claude Sonnet 4.

  • Updated May 18, 2026
  • Python

Improve this page

Add a description, image, and links to the confidence-scoring topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the confidence-scoring topic, visit your repo's landing page and select "manage topics."

Learn more

AltStyle によって変換されたページ (->オリジナル) /