FastDeploy 2.2 实测活动现已上线

FastDeploy 2.2版本现已增加对baidu/ERNIE-4.5-21B-A3B-Thinking的支持,参与活动,即可快速部署和体验全新顶尖模型,同时赢取实测奖励!

🎯 ERNIE-4.5高效推理实践

🧑‍💻 任务描述

基于FastDeploy2.2,实现ERNIE 4.5任意模型的高效推理,并测试验证结果。

💰 完成标准

第1步:进行测试,服务正常启动并且能够响应对话请求
第2步:通过该问卷提交,注意将您所有的作业文件(py文件、日志、截图、blog地址等)上传至问卷最后一题(问卷地址:https://www.wjx.top/vm/meSsp3L.aspx# )
截止时间 2025年10月30日

📕 参考教程:

Step1 环境准备

https://docs.nvidia.com/nvshmem/release-notes-install-guide/install-guide/abstract.html#hardware-requirements
硬件环境

显卡:H800 两机(单机8卡)一台作为P节点,一台作为D节点
机内显卡连接:支持NVLINK
机间显卡连接:GPUDirect RDMA (GDR),下面的网络设备可以支持

InfiniBand/RoCE with a Mellanox adapter (CX-4 or later)
Slingshot-11 (Libfabric CXI provider)
Amazon EFA (Libfabric EFA provider)

物理机软件环境

操作系统:推荐ubuntu22.04(基于deep ep的稳定性考虑)
GPU Driver: 推荐550.127.08 以上
IBGDA支持:配置driver并安装GDRCopy(参考https://github.com/deepseek-ai/DeepEP/blob/main/third-party/README.md)

image

Step2 安装 & 模型

https://github.com/PaddlePaddle/FastDeploy/blob/v2.1.1/docs/get_started/installation/nvidia_gpu.md
使用Docker

docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.1.0

进入镜像&安装最新版FastDeploy

git clone https://github.com/PaddlePaddle/FastDeploy.git
cd FastDeploy
bash build.sh

拉取模型

可以从飞桨AI Studio/ modelscope /HuggingFace拉取 ERNIE-4.5-300B-A47B-FP8-Paddle
飞桨AI Studio: https://aistudio.baidu.com/modelsdetail/30644/intro 
modelscope: https://modelscope.cn/models/PaddlePaddle/ERNIE-4.5-300B-A47B-FP8-Paddle
HuggingFace:https://huggingface.co/baidu/ERNIE-4.5-300B-A47B-FP8-Paddle

必要组件

apt-get install -y redis-server
redis-server --protected-mode no &

Step3 启动 & 测试

#3883

启动P节点

export FD_LOG_DIR="log_prefill"
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
echo "set RDMA NICS"
export $(bash scripts/get_rdma_nics.sh gpu)
echo "KVCACHE_RDMA_NICS ${KVCACHE_RDMA_NICS}"
python -m fastdeploy.entrypoints.openai.api_server \
--model ERNIE-4_5-300B-A47B-FP8-Paddle \
--port 8180 --metrics-port 8181 \
--engine-worker-queue-port "25611,25621,25631,25641,25651,25661,25671,25681" \
--cache-queue-port 8183 \
--tensor-parallel-size 1 \
--data-parallel-size 4 \
--enable-expert-parallel \
--cache-transfer-protocol "rdma,ipc" \
--rdma-comm-ports "7671,7672,7673,7674,7675,7676,7677,7678" \
--pd-comm-port "2334" \
--splitwise-role "prefill" \
--scheduler-name "splitwise" \
--scheduler-host "127.0.0.1" \
--scheduler-port 6379 \
--scheduler-topic "test" \
--scheduler-ttl 9000

启动D节点

export FD_LOG_DIR="log_decode"
rm -rf $FD_LOG_DIR
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
echo "set RDMA NICS"
export $(bash scripts/get_rdma_nics.sh gpu)
echo "KVCACHE_RDMA_NICS ${KVCACHE_RDMA_NICS}"
python -m fastdeploy.entrypoints.openai.api_server \
--model /workspace/ERNIE-4_5-300B-A47B-FP8-Paddle \
--port 8180 --metrics-port 8181 \
--engine-worker-queue-port "25611,25621,25631,25641,25651,25661,25671,25681" \
--cache-queue-port 8183 \
--tensor-parallel-size 1 \
--data-parallel-size 8 \
--enable-expert-parallel \
--cache-transfer-protocol "rdma,ipc" \
--rdma-comm-ports "7671,7672,7673,7674,7675,7676,7677,7678" \
--pd-comm-port "2334" \
--splitwise-role "prefill" \
--scheduler-name "splitwise" \
--scheduler-host "127.0.0.1" \
--scheduler-port 6379 \
--scheduler-topic "test" \
--num-gpu-blocks-override 1024 \
--scheduler-ttl 9000 \
--splitwise-role "decode"

测试

curl -X POST "http://0.0.0.0:8188/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Hello!"}]}'

Replies: 1 comment 1 reply

q-prashant
Oct 10, 2025

Does this mean "bring-your-own-GPU" ?
Also, is this open for non-Chinese people?

1 reply

@ethan7zhanghx

ethan7zhanghx Oct 10, 2025
Author

Thank you for your interest!

Regarding your first question, yes, you will need to bring your own GPU, but both local and cloud-based GPUs are supported.
As for your second question, we warmly welcome participants from non-Chinese regions to join!

If you are unable to access the submission link in the main content, please feel free to send it to my email at [zhanghaoxin0819@gmail.com]

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🎉 解锁高效推理部署核心技术FastDeploy,2.2版本实测活动上线! #4081

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

ethan7zhanghx
Sep 12, 2025

FastDeploy 2.2 实测活动现已上线

🎯 ERNIE-4.5高效推理实践

🧑‍💻 任务描述

💰 完成标准

📕 参考教程:

Step1 环境准备

Step2 安装 & 模型

Step3 启动 & 测试

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

q-prashant
Oct 10, 2025

Uh oh!

{{title}}

Uh oh!

ethan7zhanghx Oct 10, 2025
Author

Select a reply

Uh oh!

🎉 解锁高效推理部署核心技术FastDeploy,2.2版本实测活动上线! #4081

Uh oh!

Uh oh!

ethan7zhanghx Sep 12, 2025

FastDeploy 2.2 实测活动现已上线

🎯 ERNIE-4.5高效推理实践

🧑‍💻 任务描述

💰 完成标准

📕 参考教程:

Step1 环境准备

Step2 安装 & 模型

Step3 启动 & 测试

Replies: 1 comment · 1 reply

Uh oh!

q-prashant Oct 10, 2025

Uh oh!

ethan7zhanghx Oct 10, 2025 Author

ethan7zhanghx
Sep 12, 2025

Replies: 1 comment 1 reply

q-prashant
Oct 10, 2025

ethan7zhanghx Oct 10, 2025
Author