Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Add inference and communication test support#27

Open
zzhfz wants to merge 2 commits into
master from
feat/unified-scripts
Open

Add inference and communication test support #27
zzhfz wants to merge 2 commits into
master from
feat/unified-scripts

Conversation

@zzhfz

@zzhfz zzhfz commented Feb 26, 2026

Copy link
Copy Markdown
Collaborator

Description

统一脚本管理,支持推理、通信测试的一体化运行

Changes

  • 增强 install_deps.sh,支持 -/inference/comm 组件安装
  • 更新 run_tests.sh,支持组件化依赖检查

Test Results

  • inference
(megatron) sunjinge@server:~/InfiniMetrics$ ./scripts/run_tests.sh --check inference test_inference.json
==========================================
InfiniMetrics Test Runner
==========================================
Checking inference dependencies:
 vLLM... [OK]
 InfiniLM... [OK] (version unknown)
Environment: INFINI_ROOT=/home/sunjinge/.infini
NCCL_ROOT=/home/sunjinge/.infini/nccl
CUDA_VISIBLE_DEVICES=not set
==========================================
Running: InfiniMetrics
Time: 20260226_024950
==========================================
Running 1 input path(s):
 - test_inference.json
 ... ... ... 
2026年02月26日 02:51:17,875 - infinimetrics.utils.accelerator_monitor - INFO - nvidia monitoring stopped
2026年02月26日 02:51:18,238 - infinimetrics.inference.frameworks.vllm_impl - INFO - vLLM model unloaded
2026年02月26日 02:51:18,238 - infinimetrics.inference.direct - INFO - Model unloaded
2026年02月26日 02:51:18,359 - infinimetrics.inference.inference_adapter - INFO - Inference adapter teardown complete
2026年02月26日 02:51:18,361 - infinimetrics.executor - INFO - Executor: infer.vLLM.Direct completed with code=0
2026年02月26日 02:51:18,361 - infinimetrics.dispatcher - INFO - Summary saved to summary_output/dispatcher_summary_20260226_025118.json
============================================================
Test Summary
============================================================
Total tests: 1
Successful: 1
Failed: 0
Success rate: 100.0%
============================================================
==========================================
[OK] Test completed: InfiniMetrics
Time: 20260226_025119
==========================================
  • comm
(megatron) sunjinge@server:~/InfiniMetrics$ ./scripts/run_tests.sh test_comm.json
==========================================
InfiniMetrics Test Runner
==========================================
Environment: INFINI_ROOT=/home/sunjinge/.infini
NCCL_ROOT=/home/sunjinge/.infini/nccl
CUDA_VISIBLE_DEVICES=not set
==========================================
Running: InfiniMetrics
Time: 20260226_025208
==========================================
Running 1 input path(s):
 - test_comm.json
2026年02月26日 02:52:08,405 - infinimetrics.utils.input_loader - INFO - Loaded 1 input(s) from test_comm.json
2026年02月26日 02:52:08,405 - infinimetrics.dispatcher - INFO - Processing 1 valid inputs (skipped 0 invalid)
2026年02月26日 02:52:08,415 - infinimetrics.dispatcher - INFO - Validation complete: 1 valid, 0 skipped
2026年02月26日 02:52:08,415 - infinimetrics.dispatcher - INFO - [1/1] Executing comm.NcclTest.AllReduce
2026年02月26日 02:52:08,415 - infinimetrics.executor - INFO - Executor: Running comm.NcclTest.AllReduce
2026年02月26日 02:52:08,416 - infinimetrics.communication.nccl_adapter - INFO - Set CUDA_VISIBLE_DEVICES=0,1
2026年02月26日 02:52:08,416 - infinimetrics.communication.nccl_adapter - INFO - NCCL tests found at: /home/sunjinge/InfiniMetrics/submodules/nccl-tests
2026年02月26日 02:52:11,508 - infinimetrics.communication.nccl_adapter - INFO - NCCL adapter teardown complete
2026年02月26日 02:52:11,510 - infinimetrics.executor - INFO - Executor: comm.NcclTest.AllReduce completed with code=0
2026年02月26日 02:52:11,510 - infinimetrics.dispatcher - INFO - Summary saved to summary_output/dispatcher_summary_20260226_025211.json
============================================================
Test Summary
============================================================
Total tests: 1
Successful: 1
Failed: 0
Success rate: 100.0%
============================================================
==========================================
[OK] Test completed: InfiniMetrics
Time: 20260226_025211
==========================================
  • install test
(megatron) sunjinge@server:~/InfiniMetrics$ ./scripts/common/install_deps.sh inference
==========================================
InfiniMetrics Dependency Manager
==========================================
==========================================
Inference Frameworks
==========================================
Installing vLLM...
[OK] vLLM already installed
Installing InfiniLM...
[OK] InfiniLM already installed
==========================================
[OK] All operations completed successfully!
==========================================
(megatron) sunjinge@server:~/InfiniMetrics$ ./scripts/common/install_deps.sh comm
==========================================
InfiniMetrics Dependency Manager
==========================================
==========================================
NCCL Tests (Communication)
==========================================
Checking dependencies...
 CUDA... [OK] (nvcc 13.0)
Checking NCCL tests...
[OK] NCCL tests already built
==========================================
[OK] All operations completed successfully!
==========================================

Comment thread scripts/common/install_deps.sh Outdated
export LD_LIBRARY_PATH="$INFINI_ROOT/lib:$LD_LIBRARY_PATH"
export NCCL_ROOT="$INFINI_ROOT/nccl"
export PATH="$NCCL_ROOT/bin:$PATH"
export LD_LIBRARY_PATH="$NCCL_ROOT/lib:$LD_LIBRARY_PATH"

@Chamberlain0w0 Chamberlain0w0 Feb 27, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 NCCL_ROOT 假定的不太合适,用户不会把 nccl 装到 INFINI_ROOT 下面,这几行可以删了

Comment thread scripts/common/install_deps.sh Outdated
}

# Check NCCL
check_nccl() {

@Chamberlain0w0 Chamberlain0w0 Feb 27, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这部分我感觉也可以去掉,nccl 应该是基本要求,可以默认用户都装了。另外如果要检查也得先检查头文件,再检查动态库,也有点繁琐


# Check Megatron-LM
check_megatron() {
echo -n " Megatron-LM... "

@Chamberlain0w0 Chamberlain0w0 Feb 27, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块加个 TODO 或者 FIXME 吧,有可能直接拿源码运行了,而不是通过调用 megatron 的 python 包,等之后这块合进来了得再改下,感觉跟 nccl-tests 一样当作 submodule 引入就行

Comment thread scripts/common/install_deps.sh Outdated
# Check InfiniTrain
check_infinitrain() {
echo -n " InfiniTrain... "
if check_python_package infinitrain; then

@Chamberlain0w0 Chamberlain0w0 Feb 27, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块也加个 TODO,infinitrain 暂时应该也没 python 包

Comment thread scripts/common/install_deps.sh Outdated
local INFINITRAIN_PATH="${INFINITRAIN_PATH:-$PROJECT_ROOT/submodules/InfiniTrain}"
if [ -d "$INFINITRAIN_PATH" ]; then
cd "$INFINITRAIN_PATH"
if pip install -e .; then

@Chamberlain0w0 Chamberlain0w0 Feb 27, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块加个 TODO,之后换成实际的 build 方式

Comment thread scripts/run_tests.sh Outdated
;;
comm)
echo "Checking communication dependencies:"
check_nccl

@Chamberlain0w0 Chamberlain0w0 Feb 27, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块也删了吧

Comment thread scripts/run_tests.sh Outdated
all)
echo "Checking all dependencies:"
check_cuda
check_nccl

@Chamberlain0w0 Chamberlain0w0 Feb 27, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还有这里

# ========================================
# Inference Frameworks
# ========================================
install_inference() {

@baominghelly baominghelly Feb 27, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InfiniCore是InfiniLM的依赖,在安装InfiniLM之前是不是应该先调check_infinicore检查一下呢?

echo ""
echo "Please set INFINILM_PATH environment variable:"
echo "export INFINILM_PATH=/home/sunjinge/InfiniLM"
echo "export INFINILM_PATH=/home/sunjinge/InfiniLM"

@baominghelly baominghelly Mar 12, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个地方的硬编码需要处理一下

Comment on lines +427 to +431
if [ -n "$INFINILM_PATH" ] && [ -d "$INFINILM_PATH" ]; then
INFINILM_PATH="$INFINILM_PATH"
# Priority 2: Home directory
elif [ -d "$HOME/InfiniLM" ]; then
INFINILM_PATH="$HOME/InfiniLM"

@baominghelly baominghelly Mar 12, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以化简,自赋值没有意义

if [ -z "$INFINILM_PATH" ] || [ ! -d "$INFINILM_PATH" ]; then
 if [ -d "$HOME/InfiniLM" ]; then
 INFINILM_PATH="$HOME/InfiniLM"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@baominghelly baominghelly baominghelly left review comments

@Chamberlain0w0 Chamberlain0w0 Awaiting requested review from Chamberlain0w0

Requested changes must be addressed to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /