Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
This repository was archived by the owner on Jan 15, 2024. It is now read-only.

[Bug][Fix][WIP] Fix pre-layernormalization in Transformer #1488

Open
sxjscience wants to merge 8 commits into dmlc:master
base: master
Choose a base branch
Loading
from sxjscience:fix_pre_ln

Conversation

Copy link
Member

@sxjscience sxjscience commented Jan 18, 2021

Description

Fix the additional of the residual connection. The previous implementation was not correct. I'm rerunning the Transformer-Big-pre-ln experiment.

@yongyi-wu

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

cc @dmlc/gluon-nlp-team

yongyi-wu reacted with rocket emoji
@sxjscience sxjscience requested a review from a team as a code owner January 18, 2021 01:06
@sxjscience sxjscience changed the title (削除) [Bug][Fix] Fix pre-layernormalization in Transformer (削除ここまで) (追記) [Bug][Fix][WIP] Fix pre-layernormalization in Transformer (追記ここまで) Jan 18, 2021
Copy link

codecov bot commented Jan 18, 2021
edited
Loading

Codecov Report

Merging #1488 (3c7c4c1) into master (c582b64) will increase coverage by 3.21%.
The diff coverage is 100.00%.

Impacted file tree graph

@@ Coverage Diff @@
## master #1488 +/- ##
==========================================
+ Coverage 81.98% 85.19% +3.21% 
==========================================
 Files 52 52 
 Lines 6909 6822 -87 
==========================================
+ Hits 5664 5812 +148 
+ Misses 1245 1010 -235 
Impacted Files Coverage Δ
src/gluonnlp/data/batchify.py 88.72% <ø> (ø)
src/gluonnlp/layers.py 87.15% <100.00%> (+0.03%) ⬆️
src/gluonnlp/models/transformer.py 98.93% <100.00%> (-0.01%) ⬇️
conftest.py 76.31% <0.00%> (-8.69%) ⬇️
src/gluonnlp/data/loading.py 75.75% <0.00%> (-7.64%) ⬇️
src/gluonnlp/utils/lazy_imports.py 58.42% <0.00%> (-2.25%) ⬇️
src/gluonnlp/utils/misc.py 52.51% <0.00%> (-1.06%) ⬇️
src/gluonnlp/data/tokenizers/yttm.py 81.73% <0.00%> (-1.02%) ⬇️
src/gluonnlp/data/tokenizers/spacy.py 65.33% <0.00%> (-0.91%) ⬇️
src/gluonnlp/data/tokenizers/huggingface.py 71.06% <0.00%> (-0.78%) ⬇️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c582b64...3c7c4c1. Read the comment docs.

Copy link
Member

Looks good— it seems all issues related to pre-norm and skip connection have been fixed.

Copy link

Copy link
Member Author

I noticed that the performance becomes worse after I changed the implementation. Still investigating the issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /