Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
@wyjoutstanding
wyjoutstanding
Follow
路很长,你尽管走就是~

Block or report wyjoutstanding

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,539 1,311 Updated Jul 9, 2025

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,912 701 Updated Jun 23, 2026

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 6,534 611 Updated Jun 23, 2026

NVIDIA Linux open GPU kernel module source

C 17,108 1,725 Updated Jun 17, 2026

Yinghan's Code Sample

Cuda 365 62 Updated Jul 25, 2022

KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)

Jupyter Notebook 1,078 174 Updated Mar 24, 2026

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 29,567 6,684 Updated Jun 23, 2026

A Datacenter Scale Distributed Inference Serving Framework

Rust 7,328 1,266 Updated Jun 23, 2026

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python 3,122 266 Updated Jun 23, 2026

FlashInfer: Kernel Library for LLM Serving

Python 5,844 1,072 Updated Jun 23, 2026

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 7,406 1,060 Updated Jun 23, 2026

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 1,031 419 Updated Jun 23, 2026
Cuda 1 Updated Nov 1, 2023

how to optimize some algorithm in cuda.

Cuda 3,095 279 Updated Jun 23, 2026

LaTeX Thesis Template for the University of Chinese Academy of Sciences

TeX 3,881 952 Updated Feb 29, 2024
Cuda 651 113 Updated Jun 23, 2026

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

C++ 5,003 760 Updated Feb 8, 2024

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

LLVM 38,938 17,593 Updated Jun 23, 2026

Development repository for the Triton language and compiler

MLIR 19,511 2,959 Updated Jun 23, 2026

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 21,321 1,834 Updated Mar 5, 2026

Efficient Deep Learning Systems course materials (HSE, YSDA)

Jupyter Notebook 1,006 149 Updated May 28, 2026

Ongoing research training transformer models at scale

Python 16,809 4,111 Updated Jun 23, 2026

C++那些事

C++ 43,237 8,829 Updated May 16, 2026

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Python 13,612 912 Updated Dec 17, 2024

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Python 24,605 3,276 Updated Aug 15, 2024

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 60,052 10,344 Updated Nov 12, 2025

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

C++ 23,983 6,009 Updated Jun 23, 2026

pytorch memory track code

Python 1,013 152 Updated May 4, 2021
Python 69 10 Updated Mar 19, 2023
Next

AltStyle によって変換されたページ (->オリジナル) /