title: "ZeRO-Inference: 20X faster inference through weight quantization and KV cache offloading" excerpt: "" link: https://github.com/deepspeedai/DeepSpeedE...

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models Permalink

August 23, 2023

DeepSpeed Ulysses: Transformerモデルを非常に長いシーケンスで訓練するための最適化 Permalink

August 23, 2023

DeepSpeed Ulysses: 训练极长序列Transformer模型的系统优化 Permalink

August 23, 2023

DeepSpeed ZeRO++: A leap in speed for LLM and chat model training with 4X less communication Permalink

June 21, 2023

DeepSpeed ZeRO++: LLMやチャットモデルの訓練を劇的に高速化 – 通信オーバヘッドを1/4に大幅削減 - Permalink

June 21, 2023

DeepSpeed ZeRO++:降低4倍网络通信,显著提高大模型及类ChatGPT模型训练效率 Permalink

June 21, 2023

DeepSpeed主要技術の概要紹介

June 6, 2023

DeepSpeed Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales Permalink

April 23, 2023

DeepSpeed Chat: ChatGPTライクなモデルを簡単・高速・低コストに、あらゆるスケールで学習 Permalink

April 23, 2023

DeepSpeed Chat: 一键式RLHF训练,让你的类ChatGPT千亿大模型提速省钱15倍 Permalink

April 23, 2023

Scaling Large-Scale Generative Mixture-of-Expert Multimodal Model With VL-MoE

March 30, 2023

DeepSpeed Data Efficiency: A composable library that makes better use of data, increases training efficiency, and improves model quality

December 11, 2022

DeepSpeed-MII: instant speedup on 24,000+ open-source DL models with up to 40x cheaper inference

October 10, 2022

ZeRO-Inference: Democratizing massive model inference

September 9, 2022

Azure empowers easy-to-use, high-performance, and hyperscale model training using DeepSpeed

July 25, 2022

Supporting efficient large model training on AMD Instinct GPUs with DeepSpeed Permalink

March 20, 2022

DeepSpeed: Advancing MoE inference and training to power next-generation AI scale Permalink

January 18, 2022

DeepSpeed-MoE for NLG: Reducing the training cost of language models by 5 times

December 9, 2021

Autotuning: Automatically discover the optimal DeepSpeed configuration that delivers good training speed

November 16, 2021

DeepSpeed powers 8x larger MoE model training with high performance Permalink

August 17, 2021

DeepSpeed: Accelerating large-scale model inference and training via system optimizations and compression Permalink

May 14, 2021

Mixture-of-Quantization: A novel quantization approach for reducing model size with minimal accuracy impact

May 4, 2021

DeepSpeed Inference: Multi-GPU inference with customized inference kernels and quantization support

March 15, 2021

DeepSpeed ZeRO-3 Offload

March 7, 2021

Progressive Layer Dropping

October 28, 2020

DeepSpeed Sparse Attention

September 8, 2020

Training a Trillion Parameters with Pipeline Parallelism

September 8, 2020

Up to 5x less communication and 3.4x faster training through 1-bit Adam

September 8, 2020

DeepSpeed with 1-bit Adam: 5x less communication and 3.4x faster training

September 8, 2020

10x bigger model training on a single GPU with ZeRO-Offload

September 8, 2020

Powering 10x longer sequences and 6x faster execution through DeepSpeed Sparse Attention

September 8, 2020

DeepSpeed Microsoft Research Webinar is now on-demand Permalink

August 6, 2020

DeepSpeed Microsoft Research Webinar on August 6th, 2020 Permalink

July 23, 2020

Microsoft DeepSpeed achieves the fastest BERT training time

May 27, 2020

ZeRO-2 & DeepSpeed: Shattering Barriers of Deep Learning Speed & Scale Permalink

May 18, 2020

An Order-of-Magnitude Larger and Faster Training with ZeRO-2

May 18, 2020

The Fastest and Most Efficient BERT Training through Optimized Transformer Kernels

May 18, 2020

ZeRO stage 1 with reduced communication

March 17, 2020

Partition-aware ZeRO with up to 2x reduction in communication time!

Turing-NLG: A 17-billion-parameter language model by Microsoft Permalink

February 13, 2020

DeepSpeed was used to train the world’s largest language model.

ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters Permalink

February 13, 2020