Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
@KilJaeeun
KilJaeeun
Follow

κΈΈμž¬μ€ KilJaeeun

πŸ’
Cherry

Organizations

@AUSG

Block or report KilJaeeun

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Open Source Continuous Inference Benchmarking - GB200 NVL72 vs MI355X vs B200 vs H200 vs MI325X & soonTM TPUv6e/v7/Trainium2/3/GB300 NVL72 - DeepSeek 670B MoE, GPTOSS

Python 407 68 Updated Dec 30, 2025

My personal website source code

JavaScript 3 Updated Feb 28, 2024

A modern web interface for managing and interacting with vLLM servers (www.github.com/vllm-project/vllm). Supports both GPU and CPU modes, with special optimizations for macOS Apple Silicon and ent...

Python 244 40 Updated Dec 30, 2025

System Level Intelligent Router for Mixture-of-Models

Go 2,605 370 Updated Dec 30, 2025

[NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evaluations".

Python 36 4 Updated Jun 8, 2023

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)

Python 12,653 1,302 Updated Dec 17, 2025

πŸŽ’ Token-Oriented Object Notation (TOON) – Compact, human-readable, schema-aware JSON for LLM prompts. Spec, benchmarks, TypeScript SDK.

TypeScript 21,238 936 Updated Dec 15, 2025

The official Python library for the OpenAI API

Python 29,583 4,488 Updated Dec 19, 2025

Introduction to Machine Learning Systems

JavaScript 12,095 1,362 Updated Dec 30, 2025

cuda_by_example

C 1 1 Updated Sep 15, 2025

Official Repository for "Glyph: Scaling Context Windows via Visual-Text Compression"

Python 541 50 Updated Nov 4, 2025

Implementation of ICCV 2025 paper "Growing a Twig to Accelerate Large Vision-Language Models".

Python 22 3 Updated Dec 29, 2025

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 2,483 296 Updated Dec 19, 2025

A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters

Python 53 1 Updated Jul 23, 2024

[ICML 2025] Efficiently Serving Large Multimodal Models Using EPD Disaggregation

21 2 Updated May 29, 2025

LLM inference in C/C++

C++ 92,206 14,304 Updated Dec 30, 2025

Famous Vision Language Models and Their Architectures

Markdown 1,131 52 Updated Feb 24, 2025

Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model smaller while preserving accuracy.

Python 586 50 Updated Dec 23, 2025

Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model

Python 3,603 359 Updated May 13, 2025

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 730 75 Updated Nov 30, 2025

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 6,172 569 Updated Aug 22, 2025

ArcticInference: vLLM plugin for high-throughput, low-latency inference

Python 356 41 Updated Dec 27, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,861 304 Updated Jun 12, 2025
Python 12 Updated Sep 1, 2023

Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events

C++ 8,756 957 Updated Dec 23, 2025

PyTorch library for cost-effective, fast and easy serving of MoE models.

Python 268 20 Updated Oct 15, 2025

A curated collection of fun and creative examples generated with Nano Banana & Nano Banana Pro🍌, Gemini-2.5-flash-image based model. We also release Nano-consistent-150K openly to support the commu...

19,319 2,018 Updated Dec 12, 2025

fmchisel: Efficient Compression and Training Algorithms for Foundation Models

Python 81 9 Updated Oct 23, 2025

GenAI inference performance benchmarking tool

Python 137 55 Updated Dec 22, 2025

Slides, videos, and supporting files for my public talks

Python 33 4 Updated Dec 12, 2025
Next

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /