Nov 13, 2024 · Oct 28, 2024 · Oct 28, 2024 · Oct 28, 2024 · Oct 29, 2024 · Oct 29, 2024
diff --git a/README.md b/README.md
diff --git a/README_cn.md b/README_cn.md
diff --git a/mftcoder_accelerate/README.md b/mftcoder_accelerate/README.md
diff --git a/mftcoder_accelerate/README_cn.md b/mftcoder_accelerate/README_cn.md
Original file line number	Diff line number	Diff line change
Expand Up		@@ -46,9 +46,9 @@


		## News
	🔥🔥🔥 [2024/11/01] We released MFTCoder v0.5 mainly for MFTCoder-accelerate, which is now supporting preference alignment methods like DPO/RPO/ORPO in the new xxpo module, adding full-parameter continue-training in the additional mpt module along with its offline_tokenization module, updating selfpaced method to new convergence balance(CoBa) method for MFT in the original pefts module.
	🔥🔥🔥 [2024/10/31] We released MFTCoder v0.5 mainly for MFTCoder-accelerate, which is now supporting preference alignment methods like DPO/RPO/ORPO in the new xxpo module, adding full-parameter continue-training in the additional mpt module along with its offline_tokenization module, updating selfpaced method to new convergence balance(CoBa) method for MFT in the original pefts module.

	🔥🔥🔥 [2024/11/01] Our paper [CoBa: Convergence Balancer for Multitask Finetuning of Large Language Models](https://arxiv.org/abs/2410.06741) has been accepted by EMNLP-2024, which achieves balanced convergence across various tasks.
	🔥🔥🔥 [2024/10/31] Our paper [CoBa: Convergence Balancer for Multitask Finetuning of Large Language Models](https://arxiv.org/abs/2410.06741) has been accepted by EMNLP-2024, which achieves balanced convergence across various tasks.

		🔥🔥🔥 [2024年05月20日] We released MFTCoder v0.4, mainly for MFTCoder-accelerate. It supports QLoRA + DeepSpeed Zero3 and QLoRA + FSDP as options allowing you training very large models. It now supports new models like Qwen2, Qwen2-MoE, Starcoder2, Gemma, etc.

Expand Down
Original file line number	Diff line number	Diff line change
Expand Up		@@ -45,9 +45,9 @@


		## 新闻
	🔥🔥🔥 [2024/11/01] MFTCoder-v0.5发布,新增xxpo模块支持偏好对齐DPO/RPO/ORPO;新增mpt和offline_tokenization模块支持全量参数的加训;在原本的pefts模块(MFT)更新selfpaced收敛均衡技术并更名CoBa。
	🔥🔥🔥 [2024/10/31] MFTCoder-v0.5发布,新增xxpo模块支持偏好对齐DPO/RPO/ORPO;新增mpt和offline_tokenization模块支持全量参数的加训;在原本的pefts模块(MFT)更新selfpaced收敛均衡技术并更名CoBa。

	🔥🔥🔥 [2024/11/01] 我们的论文 [CoBa: Convergence Balancer for Multitask Finetuning of Large Language Models](https://arxiv.org/abs/2410.06741) 已被 EMNLP 2024 接收,可以实现多任务收敛均衡。
	🔥🔥🔥 [2024/10/31] 我们的论文 [CoBa: Convergence Balancer for Multitask Finetuning of Large Language Models](https://arxiv.org/abs/2410.06741) 已被 EMNLP 2024 接收,可以实现多任务收敛均衡。

		🔥🔥🔥 [2024年05月20日] MFTCoder-v0.4发布。新增支持QLoRA+ DeepSpeed Zero3, QLoRA + FSDP训练模式,可以更好的支持微调更大的模型,比如Qwen1.5-70B等。新增对Qwen2, Qwen2-MoE, Starcoder2, Gemma等模型的支持。

Expand Down
Original file line number	Diff line number	Diff line change
Expand Up		@@ -15,13 +15,13 @@

		🔥 MFTCoder-accelerate now support these modes: QLoRA/LoRA + DeepSpeed ZeRO2, QLoRA + DeepSpeed ZeRO3, Full-parameter + DeepSpeed ZeRO3, QLoRA + FSDP, Full-parameter + FSDP.

	🔥 MFTCoder-accelerate supports QLoRA + DeepSpeed ZeRO3 and QLoRA + FSDP, which both work for larger models;
	🔥 MFTCoder-accelerate supports QLoRA + DeepSpeed ZeRO3 and QLoRA + FSDP, which both work for larger models.

	🔥 MFTCoder-accelerate supports MFT/SFT on more new mainstream open-source base models: mistral, mixtral-8x7b(Mixture of Experts), deepseek, chatglm3;
	🔥 MFTCoder-accelerate supports MFT/SFT on more new mainstream open-source base models: mistral, mixtral-8x7b(Mixture of Experts), deepseek, chatglm3.

	🔥 MFTCoder-accelerate supports Self-Paced Loss for Convergence Balance;
	🔥 MFTCoder-accelerate supports Self-Paced Loss for Convergence Balance.

	🔥 MFTCoder-accelerate supports Full-parameters/QLoRA/LoRA using accelerate + DeepSpeed Framework;
	🔥 MFTCoder-accelerate supports Full-parameters/QLoRA/LoRA using accelerate + DeepSpeed Framework.

		🔥 MFTCoder-accelerate supports Multitask Fine-Tuning(MFT), which is able to balance diffenrent tasks in data level.

Expand Down Expand Up		@@ -94,7 +94,7 @@ User nth round input
		When applying inference, you always make your input string end with ```<s>bot\n``` to request the model generating answers.

		### 2.3 DPO训练数据格式
	The training data is required to be a uniformed JSONL format, in which each line of data has the following JSON format. The "chosen" and "rejected" fields are required as ```chosen``` and ```rejected``` in DPO training and both includes "chatml-style" contents.
	The training data is required to be a uniformed JSONL format, in which each line of data has the following JSON format. The "chosen" and "rejected" fields are required as ```chosen``` and ```rejected``` in DPO training and both includes "chatml-style" contents(only last content of bot differs).
		```json
		{
		"chosen":[
Expand Down
Original file line number	Diff line number	Diff line change
Expand Up		@@ -15,19 +15,19 @@

		🔥 MFTCoder-accelerate 最新支持的训练模式包括: QLoRA/LoRA + DeepSpeed ZeRO2, QLoRA + DeepSpeed ZeRO3, 全量 + DeepSpeed ZeRO3, QLoRA + FSDP, 全量 + FSDP。

	🔥 MFTCoder-accelerate 新增支持QLoRA + DeepSpeed ZeRO3, 支持QLoRA + FSDP, 可以训练更大的模型;
	🔥 MFTCoder-accelerate 新增支持QLoRA + DeepSpeed ZeRO3, 支持QLoRA + FSDP, 可以训练更大的模型。

	🔥 MFTCoder-accelerate 新增支持accelerate + FSDP框架, 支持全量微调和LoRA;
	🔥 MFTCoder-accelerate 新增支持accelerate + FSDP框架, 支持全量微调和LoRA。

	🔥 MFTCoder-accelerate 支持最新更多主流开源模型: mistral, mixtral-8x7b(Mixture of Experts), deepseek, chatglm3;
	🔥 MFTCoder-accelerate 支持最新更多主流开源模型: mistral, mixtral-8x7b(Mixture of Experts), deepseek, chatglm3。

	🔥 MFTCoder-accelerate 新增self-paced Loss, 用于收敛均衡;
	🔥 MFTCoder-accelerate 新增self-paced Loss, 用于收敛均衡。

	🔥 MFTCoder-accelerate 支持使用accelerate + DeepSpeed框架下支持全量参数/QLoRA/LoRA微调;
	🔥 MFTCoder-accelerate 支持使用accelerate + DeepSpeed框架下支持全量参数/QLoRA/LoRA微调。

	🔥 MFTCoder-accelerate 在训练中支持了多任务微调MFT, 可以同时平衡多个任务的训练,训练的模型支持多任务推理;
	🔥 MFTCoder-accelerate 在训练中支持了多任务微调MFT, 可以同时平衡多个任务的训练,训练的模型支持多任务推理。

	🔥 MFTCoder-accelerate 在训练中支持多种模型基座: codellama, llama2, llama, starcoder, codegeex2, chatglm2, qwen等
	🔥 MFTCoder-accelerate 在训练中支持多种模型基座: codellama, llama2, llama, starcoder, codegeex2, chatglm2, qwen等。

		## 2. 数据格式
		### 2.1 MFT训练数据格式
Expand Down Expand Up		@@ -87,7 +87,7 @@
		```

		### 2.3 DPO训练数据格式
	训练数据为jsonl格式,每一行的数据格式如下,其中chosen字段和rejected字段分别代表偏好对齐中的```chosen```和```rejected```,其内部依然是MFT的chatml格式。
	训练数据为jsonl格式,每一行的数据格式如下,其中chosen字段和rejected字段分别代表偏好对齐中的```chosen```和```rejected```,其内部依然是MFT的chatml格式,并且只有最后一轮对话的bot content不同。
		```json
		{
		"chosen":[
Expand Down Expand Up		@@ -292,8 +292,8 @@ _*训练需要的参数配置在```configs/_train_config```中,主要参数
		- coba_sample_valid_num: CoBa每一步要取的valid batch数。理论上当该值等于valid batch总数量时,拟合出的收敛斜率最逼近真实情况,但考虑到计算需求,建议设置为1。

		#### DPO 相关参数配置
	- xxpo: 偏好对齐方法, "dpo" 或者 "orpo".
	- beta: DPO beta, beta 越小,允许对齐后的dpo模型与ref模型的距离越远
	- xxpo: 偏好对齐方法, "dpo" 或者 "orpo"。
	- beta: DPO beta, beta 越小,允许对齐后的dpo模型与ref模型的距离越远。
		- rpo_alpha: 加到dop损失的```chosen``` NLL损失的系数,0的话就是原始DPO。
		-
		## 4. 模型使用
Expand Down