chatbot-evaluation

Evaluation results and experimental data for TRACER, demonstrating its effectiveness in discovering chatbot functionalities and detecting errors with coverage analysis and mutation testing.

mutation-testing experimental-data conversational-ai chatbot-evaluation ai-testing

Updated Nov 28, 2025

aron-radvanyi / GENERATIVE_AI_Embedding_Metrics_Calculator

Star 1

This is a repository for a Jupyter based tool to calculate Greedy Matching, Vector Extrema and Average Embedding evaluation metrics for generative AI chatbots

nlp machine-learning natural-language-processing ai deep-learning word2vec chatbot question-answering chatbot-application embedding evaluation-metrics embedding-evaluation chatbot-evaluation generative-ai embedding-metrics

Updated Jul 31, 2023
Jupyter Notebook

zichenzha0 / machine-driven-evaluation

Star 1

AI chatbot evaluation benchmark for mental health and suicide prevention with rule-based ethical alignment, inclusivity scoring, and sentiment analysis. Research paper submitted to AI & Society (Springer Nature) under review.

inclusivity benchmark mental-health research-paper springernature ai-ethics suicide-prevention lgbtq social-work chatbot-evaluation ai-societies

Updated Jun 3, 2026
Python

ScarletHollow / Human-AI-Interaction-Design

Star 0

Privacy-aware chatbot evaluation pipeline for Human-AI Interaction Design

python nlp sentiment-analysis chatbot-evaluation human-ai-interaction privacy-aware-evaluation

Updated May 20, 2026
Python

Vedansh5545 / llm-shieldbench

Star 0

Trustworthy AI evaluation tool for testing chatbot safety, reliability, hallucination behavior, privacy risk, and instruction-following quality.

benchmark ai-safety streamlit chatbot-evaluation trustworthy-ai llm prompt-injection hallucination-detection privacy-safety vedansh-labs

Updated Jun 9, 2026
Python

animesh01 / cqs-evaluation

Star 0

LLM-as-a-judge evaluation demo for conversational AI: scores chats on a 4-dimension rubric into a single 0–100 quality score and calibrates the automated judge against human labels. Synthetic demo data.

nlp model-evaluation human-in-the-loop product-analytics conversational-ai streamlit chatbot-evaluation ai-evaluation llm-evaluation llm-as-a-judge rubric-scoring chat-quality

Updated Jun 10, 2026
Python

nhidang912 / ChatbotEvaluation

Star 0

First publication & part of Master’s thesis at HCMUS: End-to-end automatic evaluation framework for retrieval-augmented chatbots (accepted at ACIIDS 2026)

data-synthesis chatbot-evaluation hallucination-detection llm-as-a-judge

Updated Mar 12, 2026
Python

prajaktapandit7 / conversational-AI-evaluation

Star 0

Structured evaluation of 30 support bot conversations measuring containment, escalation rate, intent accuracy, and CSAT correlation, LLM-assisted qualitative coding, edge cases, and recommendations.

ai-agents conversational-ai chatbot-evaluation cx-analytics qualitative-coding ai-product-evaluation bot-performance containment-rate

Updated Feb 19, 2026

Eis4TY / Eval-Any-Agent

Star 0

Self-hosted LLM/Agent batch evaluation platform with OpenAI-compatible API testing, streaming response parsing, LLM-as-a-Judge scoring, Docker deployment, and CSV/XLSX export.

docker-compose sqlite nextjs self-hosted streaming-api model-evaluation prisma chatbot-evaluation ai-evaluation llm ai-testing private-deployment llm-evaluation llm-as-a-judge dataset-evaluation openai-compatible agent-evaluation llm-benchmark batch-evaluation eval-platform

Updated May 26, 2026
TypeScript

Improve this page

Add a description, image, and links to the chatbot-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the chatbot-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chatbot-evaluation

Here are 12 public repositories matching this topic...

SparkBeyond / agentune

ricky-ma / ChatbotAnalytics

alexostrovsky01 / chatbot-testing-framework

Chatbot-TRACER / TRACER-evaluation

aron-radvanyi / GENERATIVE_AI_Embedding_Metrics_Calculator

zichenzha0 / machine-driven-evaluation

ScarletHollow / Human-AI-Interaction-Design

Vedansh5545 / llm-shieldbench

animesh01 / cqs-evaluation

nhidang912 / ChatbotEvaluation

prajaktapandit7 / conversational-AI-evaluation

Eis4TY / Eval-Any-Agent

Improve this page

Add this topic to your repo