Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Update 5.token及模型参数.md #25

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
XihWang wants to merge 1 commit into wdndev:main
base: main
Choose a base branch
Loading
from XihWang:patch-2
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
View file Open in desktop
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ Taylor在训练Galactica模型时候认为他之所以用4 epochs能提高训练

#### 6.5 多样的训练目标可以减轻多Epoch下降吗?

目前大语言模型的训练目标有很多,例如预测下一个单词是神什么的生成式目标,也有把单词masked之后用来判断是什么单词的判别式目标。**如果语言模型的训练目标多样化,那么实际上更加可能受到多epoch带来的性能损失**。
目前大语言模型的训练目标有很多,例如预测下一个单词是什么的生成式目标,也有把单词masked之后用来判断是什么单词的判别式目标。**如果语言模型的训练目标多样化,那么实际上更加可能受到多epoch带来的性能损失**。

例如,UL2这种模型就不适合多Epoch的训练,MLM这种模型受到的影响反而更小。

Expand Down

AltStyle によって変換されたページ (->オリジナル) /