Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

🎉 解锁高效推理部署核心技术FastDeploy,2.2版本实测活动上线! #4081

ethan7zhanghx started this conversation in General
Discussion options

FastDeploy 2.2 实测活动现已上线

FastDeploy 2.2版本现已增加对baidu/ERNIE-4.5-21B-A3B-Thinking的支持,参与活动,即可快速部署和体验全新顶尖模型,同时赢取实测奖励!

🎯 ERNIE-4.5高效推理实践

🧑‍💻 任务描述

基于FastDeploy2.2,实现ERNIE 4.5任意模型的高效推理,并测试验证结果。

💰 完成标准

第1步:进行测试,服务正常启动并且能够响应对话请求
第2步:通过该问卷提交,注意将您所有的作业文件(py文件、日志、截图、blog地址等)上传至问卷最后一题(问卷地址:https://www.wjx.top/vm/meSsp3L.aspx# )
截止时间 2025年10月30日

📕 参考教程:

Step1 环境准备

https://docs.nvidia.com/nvshmem/release-notes-install-guide/install-guide/abstract.html#hardware-requirements
硬件环境

  1. 显卡:H800 两机(单机8卡)一台作为P节点,一台作为D节点
  2. 机内显卡连接:支持NVLINK
  3. 机间显卡连接:GPUDirect RDMA (GDR),下面的网络设备可以支持
  • InfiniBand/RoCE with a Mellanox adapter (CX-4 or later)
  • Slingshot-11 (Libfabric CXI provider)
  • Amazon EFA (Libfabric EFA provider)

物理机软件环境

  1. 操作系统:推荐ubuntu22.04(基于deep ep的稳定性考虑)
  2. GPU Driver: 推荐550.127.08 以上
  3. IBGDA支持:配置driver并安装GDRCopy(参考https://github.com/deepseek-ai/DeepEP/blob/main/third-party/README.md)
image

Step2 安装 & 模型

https://github.com/PaddlePaddle/FastDeploy/blob/v2.1.1/docs/get_started/installation/nvidia_gpu.md
使用Docker

docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.1.0

进入镜像&安装最新版FastDeploy

git clone https://github.com/PaddlePaddle/FastDeploy.git
cd FastDeploy
bash build.sh

拉取模型

可以从飞桨AI Studio/ modelscope /HuggingFace拉取 ERNIE-4.5-300B-A47B-FP8-Paddle
飞桨AI Studio: https://aistudio.baidu.com/modelsdetail/30644/intro 
modelscope: https://modelscope.cn/models/PaddlePaddle/ERNIE-4.5-300B-A47B-FP8-Paddle
HuggingFace:https://huggingface.co/baidu/ERNIE-4.5-300B-A47B-FP8-Paddle

必要组件

apt-get install -y redis-server
redis-server --protected-mode no &

Step3 启动 & 测试

#3883

启动P节点

export FD_LOG_DIR="log_prefill"
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
echo "set RDMA NICS"
export $(bash scripts/get_rdma_nics.sh gpu)
echo "KVCACHE_RDMA_NICS ${KVCACHE_RDMA_NICS}"
python -m fastdeploy.entrypoints.openai.api_server \
--model ERNIE-4_5-300B-A47B-FP8-Paddle \
--port 8180 --metrics-port 8181 \
--engine-worker-queue-port "25611,25621,25631,25641,25651,25661,25671,25681" \
--cache-queue-port 8183 \
--tensor-parallel-size 1 \
--data-parallel-size 4 \
--enable-expert-parallel \
--cache-transfer-protocol "rdma,ipc" \
--rdma-comm-ports "7671,7672,7673,7674,7675,7676,7677,7678" \
--pd-comm-port "2334" \
--splitwise-role "prefill" \
--scheduler-name "splitwise" \
--scheduler-host "127.0.0.1" \
--scheduler-port 6379 \
--scheduler-topic "test" \
--scheduler-ttl 9000

启动D节点

export FD_LOG_DIR="log_decode"
rm -rf $FD_LOG_DIR
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
echo "set RDMA NICS"
export $(bash scripts/get_rdma_nics.sh gpu)
echo "KVCACHE_RDMA_NICS ${KVCACHE_RDMA_NICS}"
python -m fastdeploy.entrypoints.openai.api_server \
--model /workspace/ERNIE-4_5-300B-A47B-FP8-Paddle \
--port 8180 --metrics-port 8181 \
--engine-worker-queue-port "25611,25621,25631,25641,25651,25661,25671,25681" \
--cache-queue-port 8183 \
--tensor-parallel-size 1 \
--data-parallel-size 8 \
--enable-expert-parallel \
--cache-transfer-protocol "rdma,ipc" \
--rdma-comm-ports "7671,7672,7673,7674,7675,7676,7677,7678" \
--pd-comm-port "2334" \
--splitwise-role "prefill" \
--scheduler-name "splitwise" \
--scheduler-host "127.0.0.1" \
--scheduler-port 6379 \
--scheduler-topic "test" \
--num-gpu-blocks-override 1024 \
--scheduler-ttl 9000 \
--splitwise-role "decode"

测试

curl -X POST "http://0.0.0.0:8188/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Hello!"}]}'
You must be logged in to vote

Replies: 1 comment 1 reply

Comment options

Does this mean "bring-your-own-GPU" ?
Also, is this open for non-Chinese people?

You must be logged in to vote
1 reply
Comment options

Thank you for your interest!

Regarding your first question, yes, you will need to bring your own GPU, but both local and cloud-based GPUs are supported.
As for your second question, we warmly welcome participants from non-Chinese regions to join!

If you are unable to access the submission link in the main content, please feel free to send it to my email at [zhanghaoxin0819@gmail.com]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /