HUANG Pengsheng fpshuang
Stars
NEO Series: Native Vision-Language Models from First Principles
SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models
Official repo of "Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens"
SenseNova-U series: Native Unified Paradigm with NEO-unify from the First Principles
[ICLR 2026] The official implementation of "Efficient-SAM2: Accelerating SAM2 with Object-Aware Visual Encoding and Memory Retrieval"
A lightweight inference engine supporting speculative speculative decoding (SSD).
Official code implementation of Context Cascade Compression: Exploring the Upper Limits of Text Compression
[ACL 2026] Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning
SGLang is a high-performance serving framework for large language models and multimodal models.
MOVA: Towards Scalable and Synchronized Video–Audio Generation
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
释放你的安卓微信内部存储空间,一键解放微信存储空间的工具。
LightRFT (Light Reinforcement Fine-Tuning) is an advanced reinforcement learning fine-tuning framework designed for Large Language Models (LLMs) and Vision-Language Models (VLMs).
The open-source CapCut alternative
RayGen: Multi-Modal Dataset Reinforcement for MobileCLIP and MobileCLIP2
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.
Easiest and laziest way for building multi-agent LLMs applications.
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
LAVIS - A One-stop Library for Language-Vision Intelligence
A Datacenter Scale Distributed Inference Serving Framework
OpenMMLab Model Compression Toolbox and Benchmark.
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)
DeepGEMM: clean and efficient BLAS kernel library on GPU
FlashMLA: Efficient Multi-head Latent Attention Kernels