This repository was archived by the owner on Jan 15, 2024. It is now read-only.

[Bug][Fix][WIP] Fix pre-layernormalization in Transformer #1488

Open

sxjscience wants to merge 8 commits into dmlc:master

from sxjscience:fix_pre_ln

Open

[Bug][Fix][WIP] Fix pre-layernormalization in Transformer #1488

sxjscience wants to merge 8 commits into dmlc:master from sxjscience:fix_pre_ln

Conversation

sxjscience

Copy link

Member

@sxjscience sxjscience commented Jan 18, 2021

Description

Fix the additional of the residual connection. The previous implementation was not correct. I'm rerunning the Transformer-Big-pre-ln experiment.

@yongyi-wu

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

cc @dmlc/gluon-nlp-team

sxjscience added 2 commits

January 17, 2021 16:48

@sxjscience


 fix pre_ln

e5e29af

@sxjscience


 update

b77b1e3

@sxjscience sxjscience requested a review from a team as a code owner

January 18, 2021 01:06

sxjscience added 2 commits

January 17, 2021 17:14

@sxjscience

fix

8a031a5

@sxjscience

fix

698c0a7

@sxjscience sxjscience changed the title ~~(削除) [Bug][Fix] Fix pre-layernormalization in Transformer (削除ここまで)~~ (追記) [Bug][Fix][WIP] Fix pre-layernormalization in Transformer (追記ここまで)

Jan 18, 2021

@codecov

Copy link

codecov bot commented Jan 18, 2021 •

edited

Loading

Codecov Report

Merging #1488 (3c7c4c1) into master (c582b64) will increase coverage by 3.21%.
The diff coverage is 100.00%.

Impacted file tree graph

@@ Coverage Diff @@
## master #1488 +/- ##
==========================================
+ Coverage 81.98% 85.19% +3.21% 
==========================================
 Files 52 52 
 Lines 6909 6822 -87 
==========================================
+ Hits 5664 5812 +148 
+ Misses 1245 1010 -235

Impacted Files	Coverage Δ
src/gluonnlp/data/batchify.py	`88.72% <ø> (ø)`
src/gluonnlp/layers.py	`87.15% <100.00%> (+0.03%)`	⬆️
src/gluonnlp/models/transformer.py	`98.93% <100.00%> (-0.01%)`	⬇️
conftest.py	`76.31% <0.00%> (-8.69%)`	⬇️
src/gluonnlp/data/loading.py	`75.75% <0.00%> (-7.64%)`	⬇️
src/gluonnlp/utils/lazy_imports.py	`58.42% <0.00%> (-2.25%)`	⬇️
src/gluonnlp/utils/misc.py	`52.51% <0.00%> (-1.06%)`	⬇️
src/gluonnlp/data/tokenizers/yttm.py	`81.73% <0.00%> (-1.02%)`	⬇️
src/gluonnlp/data/tokenizers/spacy.py	`65.33% <0.00%> (-0.91%)`	⬇️
src/gluonnlp/data/tokenizers/huggingface.py	`71.06% <0.00%> (-0.78%)`	⬇️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c582b64...3c7c4c1. Read the comment docs.

sxjscience added 3 commits

January 17, 2021 17:25

@sxjscience

fix

da6f3ff

@sxjscience


 fix document

caf2305

@sxjscience


 fix doc

fb630ac

@yongyi-wu

Copy link

Member

yongyi-wu commented Jan 19, 2021

Looks good— it seems all issues related to pre-norm and skip connection have been fixed.

@sxjscience


 Update transformer.py

3c7c4c1

@github-actions

Copy link

github-actions bot commented Jan 19, 2021

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1488/fix_pre_ln/index.html

@sxjscience

Copy link

Member Author

sxjscience commented Jan 20, 2021

I noticed that the performance becomes worse after I changed the implementation. Still investigating the issue.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug][Fix][WIP] Fix pre-layernormalization in Transformer #1488

Are you sure you want to change the base?

[Bug][Fix][WIP] Fix pre-layernormalization in Transformer #1488

Uh oh!

Conversation

@sxjscience sxjscience commented Jan 18, 2021

Description

Checklist

Essentials

Changes

Comments

Uh oh!

codecov bot commented Jan 18, 2021 •

edited

Loading

Uh oh!

Codecov Report

Uh oh!

yongyi-wu commented Jan 19, 2021

Uh oh!

github-actions bot commented Jan 19, 2021

Uh oh!

sxjscience commented Jan 20, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Bug][Fix][WIP] Fix pre-layernormalization in Transformer #1488

Are you sure you want to change the base?

[Bug][Fix][WIP] Fix pre-layernormalization in Transformer #1488

Uh oh!

Conversation

@sxjscience sxjscience commented Jan 18, 2021

Description

Checklist

Essentials

Changes

Comments

Uh oh!

codecov bot commented Jan 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yongyi-wu commented Jan 19, 2021

Uh oh!

github-actions bot commented Jan 19, 2021

Uh oh!

sxjscience commented Jan 20, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jan 18, 2021 •

edited

Loading