Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

✏️0成本LLM微调上手项目,⚡️一步一步使用colab训练法律LLM,基于microsoft/phi-1_5、chatglm3,包含lora微调,全参微调

Notifications You must be signed in to change notification settings

billvsme/train_law_llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

31 Commits

Repository files navigation

✏️LLM微调上手项目

一步一步使用Colab训练法律LLM,基于microsoft/phi-1_5 ,ChatGLM3-6B。通过本项目你可以0成本手动了解微调LLM。如果想要了解LLM微调具体代码实现,可以参考 my_finetune 项目🤓。

name Colab Datasets
自我认知 lora-SFT 微调 Colab self_cognition.json
法律问答 lora-SFT 微调 Colab DISC-LawLLM
法律问答 全参数-SFT 微调* Colab DISC-LawLLM
ChatGLM3-6B 自我认知 lora-SFT 微调* Colab self_cognition.json

*如果是Colab Pro会员用户,可以尝试全参数-SFT微调,使用高RAM+T4,1000条数据大概需要20+小时
*如果是Colab Pro会员用户,ChatGLM3-6B 自我认知lora-SFT 微调,使用高RAM+T4,只需要几分钟,效果比较好

目标

使用colab免费的T4显卡,完成法律问答 指令监督微调(SFT) microsoft/phi-1_5 模型

自我认知微调

自我认知数据来源:self_cognition.json

80条数据,使用T4 lora微调phi-1_5,几分钟就可以微调完毕

微调参数,具体步骤详见colab

python src/train_bash.py \
 --stage sft \
 --model_name_or_path microsoft/phi-1_5 \
 --do_train True\
 --finetuning_type lora \
 --template vanilla \
 --flash_attn False \
 --shift_attn False \
 --dataset_dir data \
 --dataset self_cognition \
 --cutoff_len 1024 \
 --learning_rate 2e-04 \
 --num_train_epochs 20.0 \
 --max_samples 1000 \
 --per_device_train_batch_size 6 \
 --per_device_eval_batch_size 6 \
 --gradient_accumulation_steps 1 \
 --lr_scheduler_type cosine \
 --max_grad_norm 1.0 \
 --logging_steps 5 \
 --save_steps 100 \
 --warmup_steps 0 \
 --neft_alpha 0 \
 --train_on_prompt False \
 --upcast_layernorm False \
 --lora_rank 8 \
 --lora_dropout 0.1 \
 --lora_target Wqkv \
 --resume_lora_training True \
 --output_dir saves/Phi1.5-1.3B/lora/my \
 --fp16 True \
 --plot_loss True

效果

法律问答微调

法律问答数据来源:DISC-LawLLM
为了减省显存,使用deepspeed stage2,cutoff_len可以最多到1792,再多就要爆显存了

deepspeed配置

{
 "train_batch_size": "auto",
 "train_micro_batch_size_per_gpu": "auto",
 "gradient_accumulation_steps": "auto",
 "gradient_clipping": "auto",
 "zero_allow_untested_optimizer": true,
 "fp16": {
 "enabled": "auto",
 "loss_scale": 0,
 "initial_scale_power": 16,
 "loss_scale_window": 1000,
 "hysteresis": 2,
 "min_loss_scale": 1
 },
 "zero_optimization": {
 "stage": 2,
 "offload_optimizer": {
 "device": "cpu",
 "pin_memory": true
 },
 "allgather_partitions": true,
 "allgather_bucket_size": 2e8,
 "reduce_scatter": true,
 "reduce_bucket_size": 2e8,
 "overlap_comm": false,
 "contiguous_gradients": true
 }
}

微调参数

1000条数据,T4大概需要60分钟

deepspeed --num_gpus 1 --master_port=9901 src/train_bash.py \
 --deepspeed ds_config.json \
 --stage sft \
 --model_name_or_path microsoft/phi-1_5 \
 --do_train True \
 --finetuning_type lora \
 --template vanilla \
 --flash_attn False \
 --shift_attn False \
 --dataset_dir data \
 --dataset self_cognition,law_sft_triplet \
 --cutoff_len 1792 \
 --learning_rate 2e-04 \
 --num_train_epochs 5.0 \
 --max_samples 1000 \
 --per_device_train_batch_size 1 \
 --per_device_eval_batch_size 1 \
 --gradient_accumulation_steps 1 \
 --lr_scheduler_type cosine \
 --max_grad_norm 1.0 \
 --logging_steps 5 \
 --save_steps 1000 \
 --warmup_steps 0 \
 --neft_alpha 0 \
 --train_on_prompt False \
 --upcast_layernorm False \
 --lora_rank 8 \
 --lora_dropout 0.1 \
 --lora_target Wqkv \
 --resume_lora_training True \
 --output_dir saves/Phi1.5-1.3B/lora/law \
 --fp16 True \
 --plot_loss True

全参微调

可以通过,estimate_zero3_model_states_mem_needs_all_live查看deepspeed各个ZeRO stage 所需要的内存。

from transformers import AutoModel, AutoModelForCausalLM
from deepspeed.runtime.zero.stage3 import estimate_zero3_model_states_mem_needs_all_live
model_name = "microsoft/phi-1_5"
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)
estimate_zero3_model_states_mem_needs_all_live(model, num_gpus_per_node=1, num_nodes=1)

如图所适 offload_optimizer -> cpu 后microsoft/phi-1_5 需要32G内存,colab高内存有52G可以满足需求。

deepspeed配置

{
 "train_batch_size": "auto",
 "train_micro_batch_size_per_gpu": "auto",
 "gradient_accumulation_steps": "auto",
 "gradient_clipping": "auto",
 "zero_allow_untested_optimizer": true,
 "fp16": {
 "enabled": "auto",
 "loss_scale": 0,
 "initial_scale_power": 16,
 "loss_scale_window": 1000,
 "hysteresis": 2,
 "min_loss_scale": 1
 },
 "zero_optimization": {
 "stage": 2,
 "offload_optimizer": {
 "device": "cpu",
 "pin_memory": true
 },
 "allgather_partitions": true,
 "allgather_bucket_size": 2e8,
 "reduce_scatter": true,
 "reduce_bucket_size": 2e8,
 "overlap_comm": false,
 "contiguous_gradients": true
 }
}
deepspeed --num_gpus 1 --master_port=9901 src/train_bash.py \
 --deepspeed ds_config.json \
 --stage sft \
 --model_name_or_path microsoft/phi-1_5 \
 --do_train True \
 --finetuning_type full \
 --template vanilla \
 --flash_attn False \
 --shift_attn False \
 --dataset_dir data \
 --dataset self_cognition,law_sft_triplet \
 --cutoff_len 1024 \
 --learning_rate 2e-04 \
 --num_train_epochs 10.0 \
 --max_samples 1000 \
 --per_device_train_batch_size 1 \
 --per_device_eval_batch_size 1 \
 --gradient_accumulation_steps 1 \
 --lr_scheduler_type cosine \
 --max_grad_norm 1.0 \
 --logging_steps 5 \
 --save_steps 1000 \
 --warmup_steps 0 \
 --neft_alpha 0 \
 --train_on_prompt False \
 --upcast_layernorm False \
 --lora_rank 8 \
 --lora_dropout 0.1 \
 --lora_target Wqkv \
 --resume_lora_training True \
 --output_dir saves/Phi1.5-1.3B/lora/law_full \
 --fp16 True \
 --plot_loss True

也可以考虑使用 kaggle,可以每周使用30个小时,可以选择2张T4,使用ZeRO stage 3 全参微调

deepspeed配置

{
 "train_batch_size": "auto",
 "train_micro_batch_size_per_gpu": "auto",
 "gradient_accumulation_steps": "auto",
 "gradient_clipping": "auto",
 "zero_allow_untested_optimizer": true,
 "fp16": {
 "enabled": "auto",
 "loss_scale": 0,
 "initial_scale_power": 16,
 "loss_scale_window": 1000,
 "hysteresis": 2,
 "min_loss_scale": 1
 },
 "zero_optimization": {
 "stage": 3,
 "overlap_comm": false,
 "contiguous_gradients": true,
 "sub_group_size": 5e7,
 "reduce_bucket_size": "auto",
 "stage3_prefetch_bucket_size": "auto",
 "stage3_param_persistence_threshold": "auto",
 "stage3_max_live_parameters": 5e7,
 "stage3_max_reuse_distance": 5e7,
 "stage3_gather_16bit_weights_on_model_save": true
 }
}

About

✏️0成本LLM微调上手项目,⚡️一步一步使用colab训练法律LLM,基于microsoft/phi-1_5、chatglm3,包含lora微调,全参微调

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

AltStyle によって変換されたページ (->オリジナル) /