llm-comparison

Star

Here are 38 public repositories matching this topic...

Language: All

Filter by language

All 38 TypeScript 8 Python 6 HTML 5 Jupyter Notebook 3 Shell 3 Go 2 Blade 1 JavaScript 1 Astro 1

Sort: Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

arc53 / llm-price-compass

Sponsor

Star 223

This project collects GPU benchmarks from various cloud providers and compares them to fixed per token costs. Use our tool for efficient LLM GPU selections and cost-effective AI models. LLM provider price comparison, gpu benchmarks to price per token calculation, gpu benchmark table

benchmark gpu hacktoberfest llm llm-inference llm-price llm-comparison inference-comparison

Updated Dec 16, 2024
TypeScript

woniu9524 / ParallelChat

Star 191

ParallelChat 是一款开源的多 AI 并行对话桌面应用,让你在一个界面中同时使用 ChatGPT、Kimi、Qwen、DeepSeek、GLM、Doubao、Yuanbao、Grok 等主流大模型,无需 API Key,完全免费。

chatbot model-comparison no-api-key open-source-ai multi-ai llm-comparison parallel-chat free-ai-tool ai-aggregator

Updated Apr 14, 2026
TypeScript

Imboyeong / Fact-vs-Fiction-LLM-Test-

Star 92

LLM 비교 연구 기반 올인원 학습 허브 플랫폼 | GPT-4o · Gemini 2.0 Flash · Claude 4.5 Sonnet 의 정확도 분석을 토대로 학생들이 목적에 맞는 최적 모델을 선택하고 활용할 수 있도록 돕는 플랫폼을 제안합니다.

gemini model-selection korean gpt claude ai-education gpt-4 claude-ai student-learning llm-comparison interactive-platform

Updated Dec 16, 2025
Jupyter Notebook

Alorse / cc-compatible-models

Star 30

Complete guide for using alternative AI models with Claude Code — including DeepSeek, Qwen, MiniMax, Kimi, GLM, MiMo, StepFun, and more. Pricing, configs, and coding plans.

glm minimax model-comparison api-integration ai-models anthropic qwen deepseek llm-pricing llm-comparison claude-code ai-coding-assistant kimi-k2

Updated Apr 20, 2026

Supahands / llm-comparison-backend

Star 22

This is an opensource project allowing you to compare two LLM's head to head with a given prompt, this section will be regarding the backend of this project, allowing for llm api's to be incorporated and used in the front-end

ai llm chatgpt llm-eval llm-api llm-comparison

Updated Jan 13, 2026
Python

petmal / MindTrial

Star 13

MindTrial: Evaluate and compare AI language models (LLMs) on text-based tasks with optional file/image attachments and tool use. Supports multiple providers (OpenAI, Google, Anthropic, DeepSeek, Mistral AI, xAI, Alibaba, Moonshot AI, OpenRouter), custom tasks in YAML, and HTML/CSV/JSON reports.

opensource openai xai artificial-intelligence-projects anthropic ai-tool openrouter qwen deepseek mistral-ai llm-evaluation-framework google-gemini-ai llm-benchmarking moonshot-ai language-models-ai llm-comparison ai-benchmark ai-evaluation-tools grok-ai ai-model-comparison

Updated May 29, 2026
Go

desktop-commander / best-value-ai

Star 11

Where should you get your AI tokens from — local GPU, pay-per-token API, or flat-fee subscription? Compare 34+ models by quality-adjusted tokens per dollar.

ai price-comparison claude llm chatgpt local-llm ollama llm-comparison token-calculator

Updated May 11, 2026
HTML

jvrck / openrouterlist

Star 10

OpenRouter model information

github-pages datatables static-site cost-estimation cost-calculator github-actions llm openrouter llm-pricing openai-compatible-api llm-comparison openrouter-api

Updated Jun 14, 2026
HTML

ctala / ai-benchmarks-alternativos

Star 8

Benchmark abierto en español de 141 LLMs (89 con 13K+ runs reales y juez Phi-4 independiente). Quality, costo, velocidad, long-context y fuga de credenciales como dimensiones separadas. Alternativas a Claude, GPT y Gemini para agentes n8n/OpenClaw. Calculadora interactiva con tus propios pesos.

ai-agents startup-tools n8n ai-models emprendedores openrouter ollama llm-evaluation llm-comparison ai-benchmark llm-benchmark spanish-ai openclaw hermes-agent claude-alternatives gpt-alternatives phi4-judge benchmark-en-espanol emprendedores-ia

Updated Jun 12, 2026
Python

attogram / small-models

Sponsor

Star 6

Comparison of small open source LLMs (8b parameters or less)

llm-evaluation llm-comparison ai-model-comparison llm-compare attogram-project

Updated Aug 30, 2025

sammy995 / Local-LLM-Arena

Star 5

Privacy-first local AI model comparison platform with blind evaluation, per-model hyperparameters, and multi-configuration testing. Compare 2-6 models side-by-side through Ollama with zero cloud dependencies.

chatbot hyperparameters-tuning ai-evaluation local-ai ollama llm-comparison blind-testing

Updated Apr 4, 2026
JavaScript

bejranonda / openclaw-eval

Star 4

Comprehensive benchmark of OpenRouter free-tier LLMs for practical applications. Evaluates models for coding, Thai language, and general use.

thai-language thai-nlp gemma model-evaluation openrouter ai-gateway nemotron llm-comparison llm-benchmark free-ai-models openclaw step-flash trinity-mini chatbot-benchmark free-tier-llm

Updated Feb 24, 2026

AkashKobal / aiverse

Star 3

Open-source multimodal AI comparison platform to send one prompt and compare responses from multiple AI models side-by-side in real time.

java open-source machine-learning ai spring-boot web-application developer-tools ai-platform ai-tools llm prompt-engineering ai-companion llm-comparison multimodal-ai

Updated Feb 20, 2026
HTML

doubleoevan / chatwar

Star 3

A fight club for LLMs. 🤫 Live demo → https://chatwar.ai

react ai google-maps chartjs http-streaming ndjson fastify vite llm llm-comparison

Updated Feb 24, 2026
TypeScript

lukecarr / litmus

Star 3

Specification testing for structured LLM responses.

specification-test openrouter llm-testing llm-comparison

Updated Jun 5, 2026
Go

shimafoolad / llm-quality-evaluator

Star 1

Automated evaluation framework for comparing LLM model versions using multi-metric assessment and Opik integration.

evaluator automated-testing opik llm llm-comparison

Updated Apr 17, 2026
Python

rahatmoktadir03 / llm-evaluation-platform

Star 1

A full-stack web application for comparing and analyzing the performance of large language models (LLMs). Features include side-by-side prompt evaluation, performance metrics visualization, and an analytics dashboard. Built with React, Tailwind CSS, Node.js, and MongoDB."

react typescript ai fullstack-development tailwind-css large-language-models prompt-engineering llm-evaluation llm-comparison

Updated Jan 6, 2025
TypeScript

nickcarndt / document-summarizer

Star 1

Compare Claude and OpenAI responses side-by-side with a built-in evaluation framework. Upload PDFs, generate summaries, ask questions using RAG, and track metrics.