ai-eval

Open evaluation harness for mental health LLM responses. 5 clinically-grounded rubrics, LLM-as-judge with bias controls, crisis-detection routing to 988 protocols.

psychology cbt ai-safety conversational-ai clinical-ai cohen-kappa ollama llm-evaluation llm-as-judge mental-health-ai ai-eval inter-rater-reliability eval-harness lifeline-988 open-source-eval

Updated May 29, 2026
Python

klausners / prompt-optimizer

Star 0

Config-driven CLI that runs promptfoo evals, identifies low-scoring prompts, rewrites them via Claude API, and re-evaluates.

cli automation claude llm prompt-engineering llm-eval prompt-optimization promptfoo ai-eval

Updated Mar 26, 2026
TypeScript

Gcy-02 / ai-chat-coach

Star 0

AI 聊天教练 MVP:Spring Boot、DeepSeek、结构化输出、两段式分析和轻量评测体系。

java spring-boot chinese ai-chatbot prompt-engineering deepseek ai-eval

Updated Jun 5, 2026
Java

zendodx / evalkit-framework

Star 0

🚀 基于Java的开源AI自动化评测框架 / An open source AI automation evaluation framework based on Java

java framework java-8 eval eval-framework ai-eval agent-eval

Updated Jun 11, 2026
Java

Improve this page

Add a description, image, and links to the ai-eval topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-eval topic, visit your repo's landing page and select "manage topics."

Learn more

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-eval

Here are 7 public repositories matching this topic...

kasimmj / claude-code-test-runner

ianfh0 / deduce

sejin8905 / ai-rag-citation-check-kit-lite

KarmaEnchanter / mental-health-llm-eval

klausners / prompt-optimizer

Gcy-02 / ai-chat-coach

zendodx / evalkit-framework

Improve this page

Add this topic to your repo