κΈΈμ¬μ KilJaeeun
Organizations
@AUSGLists (1)
Sort Name ascending (A-Z)
Starred repositories
Open Source Continuous Inference Benchmarking - GB200 NVL72 vs MI355X vs B200 vs H200 vs MI325X & soonTM TPUv6e/v7/Trainium2/3/GB300 NVL72 - DeepSeek 670B MoE, GPTOSS
A modern web interface for managing and interacting with vLLM servers (www.github.com/vllm-project/vllm). Supports both GPU and CPU modes, with special optimizations for macOS Apple Silicon and ent...
System Level Intelligent Router for Mixture-of-Models
[NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evaluations".
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
π Token-Oriented Object Notation (TOON) β Compact, human-readable, schema-aware JSON for LLM prompts. Spec, benchmarks, TypeScript SDK.
The official Python library for the OpenAI API
Introduction to Machine Learning Systems
Official Repository for "Glyph: Scaling Context Windows via Visual-Text Compression"
Implementation of ICCV 2025 paper "Growing a Twig to Accelerate Large Vision-Language Models".
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
PipeFusion / PipeFusion
Forked from xdit-project/xDiTA Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters
[ICML 2025] Efficiently Serving Large Multimodal Models Using EPD Disaggregation
Famous Vision Language Models and Their Architectures
Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model smaller while preserving accuracy.
Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
ArcticInference: vLLM plugin for high-throughput, low-latency inference
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events
PyTorch library for cost-effective, fast and easy serving of MoE models.
A curated collection of fun and creative examples generated with Nano Banana & Nano Banana Proπ, Gemini-2.5-flash-image based model. We also release Nano-consistent-150K openly to support the commu...
fmchisel: Efficient Compression and Training Algorithms for Foundation Models
GenAI inference performance benchmarking tool
Slides, videos, and supporting files for my public talks