@wyjoutstanding wyjoutstanding Follow

@wyjoutstanding

wyjoutstanding

wyjoutstanding

路很长,你尽管走就是~

20 followers · 12 following

Achievements

Achievement: Starstruck Achievement: Arctic Code Vault Contributor

Achievements

Achievement: Starstruck Achievement: Arctic Code Vault Contributor

Stars

Showing results

HW-whistleblower / True-Story-of-Pangu

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,539 1,311 Updated Jul 9, 2025

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,912 701 Updated Jun 23, 2026

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 6,534 611 Updated Jun 23, 2026

NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source

C 17,108 1,725 Updated Jun 17, 2026

Yinghan-Li / YHs_Sample

Yinghan's Code Sample

Cuda 365 62 Updated Jul 25, 2022

ScalingIntelligence / KernelBench

KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)

Jupyter Notebook 1,078 174 Updated Mar 24, 2026

sgl-project / sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 29,567 6,684 Updated Jun 23, 2026

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 7,328 1,266 Updated Jun 23, 2026

thu-pacman / chitu

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python 3,122 266 Updated Jun 23, 2026

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Python 5,844 1,072 Updated Jun 23, 2026

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 7,406 1,060 Updated Jun 23, 2026

flagos-ai / FlagGems

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 1,031 419 Updated Jun 23, 2026

openai / transformer-debugger

Python 4,122 241 Updated Apr 15, 2026

linhu-nv / unitTestLocalEnergy

Cuda 1 Updated Nov 1, 2023

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 3,095 279 Updated Jun 23, 2026

mohuangrui / ucasthesis

LaTeX Thesis Template for the University of Chinese Academy of Sciences

TeX 3,881 952 Updated Feb 29, 2024

NVIDIA / cuCollections

Cuda 651 113 Updated Jun 23, 2026

NVIDIA / thrust

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

C++ 5,003 760 Updated Feb 8, 2024

llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

LLVM 38,938 17,593 Updated Jun 23, 2026

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 19,511 2,959 Updated Jun 23, 2026

QwenLM / Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 21,321 1,834 Updated Mar 5, 2026

mryab / efficient-dl-systems

Efficient Deep Learning Systems course materials (HSE, YSDA)

Jupyter Notebook 1,006 149 Updated May 28, 2026

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 16,809 4,111 Updated Jun 23, 2026

Light-City / CPlusPlusThings

C++那些事

C++ 43,237 8,829 Updated May 16, 2026

microsoft / LoRA

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Python 13,612 912 Updated Dec 17, 2024

karpathy / minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Python 24,605 3,276 Updated Aug 15, 2024

karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 60,052 10,344 Updated Nov 12, 2025

PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

C++ 23,983 6,009 Updated Jun 23, 2026

Oldpan / Pytorch-Memory-Utils

pytorch memory track code

Python 1,013 152 Updated May 4, 2021

tongzhou80 / nanoPyC

Python 69 10 Updated Mar 19, 2023

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wyjoutstanding

Achievements

Achievements

Block or report wyjoutstanding

Stars

HW-whistleblower / True-Story-of-Pangu

InternLM / lmdeploy

tile-ai / tilelang

NVIDIA / open-gpu-kernel-modules

Yinghan-Li / YHs_Sample

ScalingIntelligence / KernelBench

sgl-project / sglang

ai-dynamo / dynamo

thu-pacman / chitu

flashinfer-ai / flashinfer

deepseek-ai / DeepGEMM

flagos-ai / FlagGems

openai / transformer-debugger

linhu-nv / unitTestLocalEnergy

BBuf / how-to-optim-algorithm-in-cuda

mohuangrui / ucasthesis

NVIDIA / cuCollections

NVIDIA / thrust

llvm / llvm-project

triton-lang / triton

QwenLM / Qwen

mryab / efficient-dl-systems

NVIDIA / Megatron-LM

Light-City / CPlusPlusThings

microsoft / LoRA

karpathy / minGPT

karpathy / nanoGPT

PaddlePaddle / Paddle

Oldpan / Pytorch-Memory-Utils

tongzhou80 / nanoPyC