Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

【GAD】使用Qwen3-VL进行GAD训练极慢 #354

Open
Assignees

Description

在将black-box distillation(GAD, https://arxiv.org/abs/2511.10643)应用于Qwen3-VL训练时,速度很慢,想问问大佬是为什么?actor和critic都是Qwen3-VL-2B-Instruct

TRAIN_BATCH_SIZE=256
VAL_BATCH_SIZE=100
MAX_PROMPT_LENGTH=4096
MAX_RESPONSE_LENGTH=3072
PPO_MINI_BATCH_SIZE=128

ROLLOUT_N=4

echo "data.train_batch_size=$TRAIN_BATCH_SIZE"
echo "actor_rollout_ref.actor.ppo_mini_batch_size=$PPO_MINI_BATCH_SIZE"
echo "actor_rollout_ref.rollout.n=$ROLLOUT_N"

学习率设置

ACTOR_LR=1e-6
CRITIC_LR=1e-6

============= 推理引擎配置 =============

GEN_TP=1

SAVE_HF_MODEL=${SAVE_HF_MODEL:-True}

if [ "${SAVE_HF_MODEL}" = "True" ]; then
CHECKPOINT_CONTENTS="['model','hf_model','optimizer','extra']"
else
CHECKPOINT_CONTENTS="['model','optimizer','extra']"
fi

============= 运行训练 =============

python3 -m verl.trainer.main_ppo
algorithm.adv_estimator=grpo
data.train_files=$TRAIN_FILES
data.train_batch_size=$TRAIN_BATCH_SIZE
data.val_files=$VAL_FILES
data.val_batch_size=$VAL_BATCH_SIZE
data.max_prompt_length=$MAX_PROMPT_LENGTH
data.max_response_length=$MAX_RESPONSE_LENGTH
data.filter_overlong_prompts=True
data.truncation=right
actor_rollout_ref.model.path=$MODEL_PATH
actor_rollout_ref.model.use_remove_padding=True
actor_rollout_ref.model.enable_gradient_checkpointing=True
actor_rollout_ref.actor.fsdp_config.model_dtype=bf16
critic.model.fsdp_config.model_dtype=bf16
actor_rollout_ref.actor.optim.lr=$ACTOR_LR
actor_rollout_ref.actor.grad_clip=0.2
actor_rollout_ref.actor.ppo_mini_batch_size=$PPO_MINI_BATCH_SIZE
actor_rollout_ref.actor.use_dynamic_bsz=True
actor_rollout_ref.actor.ppo_max_token_len_per_gpu=24576
actor_rollout_ref.actor.use_kl_loss=True
actor_rollout_ref.actor.entropy_coeff=0.0
actor_rollout_ref.actor.kl_loss_coef=0.001
actor_rollout_ref.actor.kl_loss_type=low_var_kl
actor_rollout_ref.actor.ulysses_sequence_parallel_size=1
actor_rollout_ref.actor.fsdp_config.param_offload=False
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False
actor_rollout_ref.actor.checkpoint.save_contents=${CHECKPOINT_CONTENTS}
actor_rollout_ref.ref.fsdp_config.param_offload=False
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4
critic.ppo_micro_batch_size_per_gpu=4
actor_rollout_ref.rollout.name=vllm
actor_rollout_ref.rollout.temperature=0.8
actor_rollout_ref.rollout.gpu_memory_utilization=0.4
actor_rollout_ref.rollout.top_p=0.9
actor_rollout_ref.rollout.n=$ROLLOUT_N
+actor_rollout_ref.rollout.engine_kwargs.vllm.disable_mm_preprocessor_cache=True
actor_rollout_ref.rollout.enforce_eager=False
actor_rollout_ref.rollout.free_cache_engine=False
actor_rollout_ref.rollout.prompt_length=4096
+actor_rollout_ref.rollout.repetition_penalty=1.05
actor_rollout_ref.rollout.response_length=$MAX_RESPONSE_LENGTH
critic.model.path=$REWARD_MODEL_PATH
reward_model.use_reward_loop=False
critic.model.use_remove_padding=True
critic.model.enable_gradient_checkpointing=True
critic.use_dynamic_bsz=True
critic.optim.lr=$CRITIC_LR
critic.ppo_max_token_len_per_gpu=24576
critic.grad_clip=0.2
critic.enable=True
critic.model.fsdp_config.optimizer_offload=False
critic.checkpoint.save_contents=${CHECKPOINT_CONTENTS}
algorithm.kl_ctrl.kl_coef=0.001
trainer.val_before_train=False
trainer.critic_warmup=0.01
trainer.logger='["console"]'
trainer.n_gpus_per_node=$GPUs
trainer.nnodes=1
trainer.save_freq=50
trainer.test_freq=-1
trainer.default_hdfs_dir=null
trainer.total_epochs=1
trainer.default_local_dir=$SAVE_DIR "$@"

我的设备为8*H200,数据集大小为40k,但是显示训练一个epoch需要30h。
观察GPU显存峰值基本在120G左右,但大部分时间只有60G在使用。调了一些参数但没什么用,训练时间基本都在28-30h。

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      AltStyle によって変換されたページ (->オリジナル) /