-
Notifications
You must be signed in to change notification settings - Fork 640
🎉 解锁高效推理部署核心技术FastDeploy,2.2版本实测活动上线! #4081
-
FastDeploy 2.2 实测活动现已上线
FastDeploy 2.2版本现已增加对baidu/ERNIE-4.5-21B-A3B-Thinking的支持,参与活动,即可快速部署和体验全新顶尖模型,同时赢取实测奖励!
🎯 ERNIE-4.5高效推理实践
🧑💻 任务描述
基于FastDeploy2.2,实现ERNIE 4.5任意模型的高效推理,并测试验证结果。
💰 完成标准
第1步:进行测试,服务正常启动并且能够响应对话请求
第2步:通过该问卷提交,注意将您所有的作业文件(py文件、日志、截图、blog地址等)上传至问卷最后一题(问卷地址:https://www.wjx.top/vm/meSsp3L.aspx# )
截止时间 2025年10月30日
📕 参考教程:
Step1 环境准备
- 显卡:H800 两机(单机8卡)一台作为P节点,一台作为D节点
- 机内显卡连接:支持NVLINK
- 机间显卡连接:GPUDirect RDMA (GDR),下面的网络设备可以支持
- InfiniBand/RoCE with a Mellanox adapter (CX-4 or later)
- Slingshot-11 (Libfabric CXI provider)
- Amazon EFA (Libfabric EFA provider)
物理机软件环境
- 操作系统:推荐ubuntu22.04(基于deep ep的稳定性考虑)
- GPU Driver: 推荐550.127.08 以上
- IBGDA支持:配置driver并安装GDRCopy(参考https://github.com/deepseek-ai/DeepEP/blob/main/third-party/README.md)
Step2 安装 & 模型
https://github.com/PaddlePaddle/FastDeploy/blob/v2.1.1/docs/get_started/installation/nvidia_gpu.md
使用Docker
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.1.0
进入镜像&安装最新版FastDeploy
git clone https://github.com/PaddlePaddle/FastDeploy.git
cd FastDeploy
bash build.sh
拉取模型
可以从飞桨AI Studio/ modelscope /HuggingFace拉取 ERNIE-4.5-300B-A47B-FP8-Paddle
飞桨AI Studio: https://aistudio.baidu.com/modelsdetail/30644/intro
modelscope: https://modelscope.cn/models/PaddlePaddle/ERNIE-4.5-300B-A47B-FP8-Paddle
HuggingFace:https://huggingface.co/baidu/ERNIE-4.5-300B-A47B-FP8-Paddle
必要组件
apt-get install -y redis-server
redis-server --protected-mode no &
Step3 启动 & 测试
启动P节点
export FD_LOG_DIR="log_prefill"
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
echo "set RDMA NICS"
export $(bash scripts/get_rdma_nics.sh gpu)
echo "KVCACHE_RDMA_NICS ${KVCACHE_RDMA_NICS}"
python -m fastdeploy.entrypoints.openai.api_server \
--model ERNIE-4_5-300B-A47B-FP8-Paddle \
--port 8180 --metrics-port 8181 \
--engine-worker-queue-port "25611,25621,25631,25641,25651,25661,25671,25681" \
--cache-queue-port 8183 \
--tensor-parallel-size 1 \
--data-parallel-size 4 \
--enable-expert-parallel \
--cache-transfer-protocol "rdma,ipc" \
--rdma-comm-ports "7671,7672,7673,7674,7675,7676,7677,7678" \
--pd-comm-port "2334" \
--splitwise-role "prefill" \
--scheduler-name "splitwise" \
--scheduler-host "127.0.0.1" \
--scheduler-port 6379 \
--scheduler-topic "test" \
--scheduler-ttl 9000
启动D节点
export FD_LOG_DIR="log_decode"
rm -rf $FD_LOG_DIR
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
echo "set RDMA NICS"
export $(bash scripts/get_rdma_nics.sh gpu)
echo "KVCACHE_RDMA_NICS ${KVCACHE_RDMA_NICS}"
python -m fastdeploy.entrypoints.openai.api_server \
--model /workspace/ERNIE-4_5-300B-A47B-FP8-Paddle \
--port 8180 --metrics-port 8181 \
--engine-worker-queue-port "25611,25621,25631,25641,25651,25661,25671,25681" \
--cache-queue-port 8183 \
--tensor-parallel-size 1 \
--data-parallel-size 8 \
--enable-expert-parallel \
--cache-transfer-protocol "rdma,ipc" \
--rdma-comm-ports "7671,7672,7673,7674,7675,7676,7677,7678" \
--pd-comm-port "2334" \
--splitwise-role "prefill" \
--scheduler-name "splitwise" \
--scheduler-host "127.0.0.1" \
--scheduler-port 6379 \
--scheduler-topic "test" \
--num-gpu-blocks-override 1024 \
--scheduler-ttl 9000 \
--splitwise-role "decode"
测试
curl -X POST "http://0.0.0.0:8188/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Hello!"}]}'
Beta Was this translation helpful? Give feedback.
All reactions
-
👀 1
Replies: 1 comment 1 reply
-
Does this mean "bring-your-own-GPU" ?
Also, is this open for non-Chinese people?
Beta Was this translation helpful? Give feedback.
All reactions
-
Thank you for your interest!
Regarding your first question, yes, you will need to bring your own GPU, but both local and cloud-based GPUs are supported.
As for your second question, we warmly welcome participants from non-Chinese regions to join!
If you are unable to access the submission link in the main content, please feel free to send it to my email at [zhanghaoxin0819@gmail.com]
Beta Was this translation helpful? Give feedback.