Open Source Continuous Inference Benchmarking - GB200 NVL72 vs MI355X vs B200 vs H200 vs MI325X & soonTM TPUv6e/v7/Trainium2/3/GB300 NVL72 - DeepSeek 670B MoE, GPTOSS

Python 407 68 Updated Dec 30, 2025

princepride / personal-website

My personal website source code

JavaScript 3 Updated Feb 28, 2024

A modern web interface for managing and interacting with vLLM servers (www.github.com/vllm-project/vllm). Supports both GPU and CPU modes, with special optimizations for macOS Apple Silicon and ent...

Python 244 40 Updated Dec 30, 2025

vllm-project / semantic-router

System Level Intelligent Router for Mixture-of-Models

Go 2,605 370 Updated Dec 30, 2025

lifan-yuan / OOD_NLP

[NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evaluations".

Python 36 4 Updated Jun 8, 2023

ShishirPatil / gorilla

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)

Python 12,653 1,302 Updated Dec 17, 2025

toon-format / toon

🎒 Token-Oriented Object Notation (TOON) – Compact, human-readable, schema-aware JSON for LLM prompts. Spec, benchmarks, TypeScript SDK.

TypeScript 21,238 936 Updated Dec 15, 2025

openai / openai-python

The official Python library for the OpenAI API

Python 29,583 4,488 Updated Dec 19, 2025

harvard-edge / cs249r_book

Introduction to Machine Learning Systems

JavaScript 12,095 1,362 Updated Dec 30, 2025

elongbug / cuda_by_example

cuda_by_example

C 1 1 Updated Sep 15, 2025

thu-coai / Glyph

Official Repository for "Glyph: Scaling Context Windows via Visual-Text Compression"

Python 541 50 Updated Nov 4, 2025

MILVLG / twigvlm

Implementation of ICCV 2025 paper "Growing a Twig to Accelerate Large Vision-Language Models".

Python 22 3 Updated Dec 29, 2025

xdit-project / xDiT

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 2,483 296 Updated Dec 19, 2025

PipeFusion / PipeFusion

Forked from xdit-project/xDiT

A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters

Python 53 1 Updated Jul 23, 2024

vbdi / epdserve

[ICML 2025] Efficiently Serving Large Multimodal Models Using EPD Disaggregation

21 2 Updated May 29, 2025

ggml-org / llama.cpp

LLM inference in C/C++

C++ 92,206 14,304 Updated Dec 30, 2025

gokayfem / awesome-vlm-architectures

Famous Vision Language Models and Their Architectures

Markdown 1,131 52 Updated Feb 24, 2025

huawei-csl / SINQ

Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model smaller while preserving accuracy.

Python 586 50 Updated Dec 23, 2025

NExT-GPT / NExT-GPT

Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model

Python 3,603 359 Updated May 13, 2025

ovg-project / kvcached

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 730 75 Updated Nov 30, 2025

meta-pytorch / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 6,172 569 Updated Aug 22, 2025

snowflakedb / ArcticInference

ArcticInference: vLLM plugin for high-throughput, low-latency inference

Python 356 41 Updated Dec 27, 2025

QwenLM / Qwen2.5-Omni

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,861 304 Updated Jun 12, 2025

L1aoXingyu / llm-infer-bench

Python 12 Updated Sep 1, 2023

async-profiler / async-profiler

Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events

C++ 8,756 957 Updated Dec 23, 2025

EfficientMoE / MoE-Infinity

PyTorch library for cost-effective, fast and easy serving of MoE models.

Python 268 20 Updated Oct 15, 2025

PicoTrex / Awesome-Nano-Banana-images

A curated collection of fun and creative examples generated with Nano Banana & Nano Banana Pro🍌, Gemini-2.5-flash-image based model. We also release Nano-consistent-150K openly to support the commu...

19,319 2,018 Updated Dec 12, 2025