skill-evaluation

Star

Here are 19 public repositories matching this topic...

Language: All

Filter by language

All 19 Python 7 TypeScript 5 HTML 2 JavaScript 2 Go 1 Shell 1

Sort: Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

Evol-ai / SkillCompass

Star 224

Evaluate agent skill quality. Find the weakest link. Fix it. Prove it worked.

ai-agents skill-evaluation anthropic agent-skills claude-code skill-rating claude-code-skill openclaw openclaw-skill

Updated Apr 23, 2026
JavaScript

AndrewNgGirl / SkillLens

Star 64

Open-source self-hosted web tool for evaluating Agent Skills with rubric scores, Deep Review, and improvement suggestions.

typescript skills skill nextjs self-hosted developer-tools cursor ai-agents claude rubric skill-evaluation llm agent-skills claude-code openclaw

Updated May 17, 2026
TypeScript

The skill OS for Codex, Claude Code, and Gemini CLI. One pool, one router, one feedback loop — across all three hosts. Per-turn semantic top-K with dynamic context sizing, session-end self-evaluation, and evidence-blended re-ranking that gets better the more you use it.

skill-evaluation self-improving-ai skill-routing skill-eval

Updated Jun 8, 2026
Python

ALEX-nlp / OpenSkillEval

Star 12

OpenSkillEval: Automatically Auditing the Open Skill Ecosystem for LLM Agents

benchmark ai-agents skill-evaluation llm-eval agent-evaluation

Updated Jun 1, 2026
Python

oh-my-knowledge

lizhiyao / oh-my-knowledge

Star 11

Evaluation framework for LLM knowledge inputs — prompts, RAG corpora, skills, agent workflows. Fix the model, vary the artifact. Built-in statistical rigor: bootstrap CI, Krippendorff α, length-debias, saturation curves.

benchmark ai evaluation-framework claude knowledge-engineering skill-evaluation llm prompt-engineering prompt-testing llm-evaluation rag-evaluation llm-judge claude-code agent-evaluation bootstrap-ci krippendorff-alpha evaluation-as-code multi-judge-ensemble

Updated Jun 14, 2026
TypeScript

huajielong / skill-evaluator

Star 5

Your AI agent skill doctor - 5-dimension scoring + security gate + leaderboard

security evaluation cursor ai-agents skill-evaluation ai-agent llm claude-code codex-cli claude-skills meta-skill openclaw skill-review hemerss

Updated Jun 10, 2026
Shell

VKirill / mcp-annas-archive-create-skill

Star 2

MCP server for Claude Code: Anna's Archive search/download + Gemini methodology extraction → audited Claude Code SKILL.md. One tool call, end-to-end.

nodejs pdf typescript mcp gemini epub methodology rag knowledge-extraction skill-evaluation ai-agent annas-archive anthropic google-ai-studio model-context-protocol agent-skills claude-code stdio-server claude-skill book-extraction

Updated May 26, 2026
TypeScript

SirryChen / triage-skill-creator

Star 1

Triage-trainer:从零为您的个人助手构建定制化的导诊 Skill,赋予精准的就诊科室推荐能力

skill triage skill-evaluation skill-creator agent-trainer

Updated Mar 28, 2026
HTML

Earnest-clockworkuniverse497 / mcp-annas-archive-create-skill

Star 1

Convert methodology books from Anna's Archive into Claude Code skills using the Model Context Protocol.

Updated Jun 14, 2026
TypeScript

yadinae / agent-evolution

Star 1

🧬 Agent 自我进化系统 - 基于数据驱动的 AI Agent 能力提升平台 | ✨ 任务监控/技能评估/智能调度/自动进化 | 📊 95%+ 测试覆盖,<20ms 延迟

python open-source machine-learning automation data-driven performance-monitoring self-improvement skill-evaluation ai-optimization ai-agent intelligent-scheduling agent-evolution

Updated Jun 12, 2026
JavaScript

yoligehude14753 / heyi-eval-v10

Star 0

Automated model evaluation pipeline v10 (post-incident architecture rewrite)

benchmark mcp chinese skill-evaluation huggingface ai-agent vllm llm-evaluation claude-code agent-evaluation

Updated Jun 2, 2026
Python

WilliamWJHuang / agent-skill-evaluator

Star 0

Evaluate agent SKILL.md files for structure, security, quality, and domain correctness.

linter quality-assurance security-analysis ai-agents skill-evaluation agent-skills

Updated Apr 18, 2026
Python

duck-ai-yy / skill-safety-reviewer

Star 0

A skill that reviews whether skills found online are safe to install for non-tech-background developers

ai-safety cowork skill-evaluation tool-evaluation claude-ai claude-skills safety-reviewer

Updated Mar 22, 2026

saniyaacharya04 / interviewforge

Star 0

AI-powered mock interview platform with automated scoring, role-based questions, modern React UI, FastAPI backend, and a fully implemented freemium SaaS architecture.

machine-learning-application technical-interviews fastapi skill-evaluation ai-tools assessment-platform interview-simulator ai-scoring

Updated Dec 13, 2025
Python

zinan92 / repo-evals

Star 0

Claim-first 仓库评测框架。in: owner/repo + repo_type → out: eval scaffold + 可靠性桶 (unusable/usable/reusable/recommendable)

bash evaluation testing-framework skill-evaluation claim-first

Updated May 26, 2026
HTML

agentskillexchange / agent-skills-resources

Star 0

Source-backed companion guide for agent skills, AI agent workflows, framework resources, evaluation templates, and rollout playbooks.

mcp verification cursor codex ai-agents skill-evaluation github-copilot langchain langgraph coding-agents gemini-cli model-context-protocol agent-skills claude-code openai-agents agent-workflows agent-safety openclaw hermes-agent rollout-playbooks

Updated Jun 11, 2026
Python

SEKIRO009 / skillsentry

Star 0

Detect malicious code and security risks in AI skill files before installation to protect AI agents from hidden threats and obfuscation techniques.

performance-tracking learning-platform skills-assessment skill-management workforce-development skill-evaluation hr-software training-management employee-training online-assessments talent-development career-growth skill-gap-analysis competency-mapping training-dashboard

Updated Jun 14, 2026
Python

pasunboneleve / skilpel

Star 0

A focused Go evaluator for agent-skills CI

testing go cli ci developer-tools ai-agents skill-evaluation llm-evaluation agent-skills

Updated Jun 1, 2026
Go

jeremylongshore / j-rig-skill-binary-eval

Sponsor

Star 0

Binary-criteria evaluation harness for Claude skills with planned extension to plugins, agents, and MCP servers. Score every change yes/no across 7 layers — package integrity, trigger quality, functional quality, regression protection, baseline value, model variance, rollout safety. Never gradients.

mcp regression-testing skill-evaluation ai-evaluation llm-eval claude-code plugin-testing eval-harness agent-eval binary-criteria

Updated Jun 14, 2026
TypeScript

Improve this page

Add a description, image, and links to the skill-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the skill-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

skill-evaluation

Here are 19 public repositories matching this topic...

Evol-ai / SkillCompass

AndrewNgGirl / SkillLens

mega-edo / mega-tron

ALEX-nlp / OpenSkillEval

lizhiyao / oh-my-knowledge

huajielong / skill-evaluator

VKirill / mcp-annas-archive-create-skill

SirryChen / triage-skill-creator

Earnest-clockworkuniverse497 / mcp-annas-archive-create-skill

yadinae / agent-evolution

yoligehude14753 / heyi-eval-v10

WilliamWJHuang / agent-skill-evaluator

duck-ai-yy / skill-safety-reviewer

saniyaacharya04 / interviewforge

zinan92 / repo-evals

agentskillexchange / agent-skills-resources

SEKIRO009 / skillsentry

pasunboneleve / skilpel

jeremylongshore / j-rig-skill-binary-eval

Improve this page

Add this topic to your repo