Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
This repository was archived by the owner on Jan 15, 2024. It is now read-only.

[Tutorial] add KoBERT tutorial #1230

Open
jamiekang wants to merge 38 commits into dmlc:v0.x
base: v0.x
Choose a base branch
Loading
from jamiekang:v0.9.x
Open

[Tutorial] add KoBERT tutorial #1230

jamiekang wants to merge 38 commits into dmlc:v0.x from jamiekang:v0.9.x

Conversation

Copy link
Contributor

@jamiekang jamiekang commented May 12, 2020

added kobert_naver_movie for KoBERT tutorial.

Description

(Brief description on what this PR is about)

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

cc @dmlc/gluon-nlp-team

szha reacted with hooray emoji
leezu and others added 10 commits February 10, 2020 18:15
* [CI] Update MXNet master version tested on CI (dmlc#1113)
* [CI] Update MXNet master version tested on CI
* Disable horovod test on master
* update link
Co-authored-by: Leonard Lausen <leonard@lausen.nl>
Co-authored-by: Lin <haibilin@a483e7be4c92.ant.amazon.com>
added kobert_naver_movie for KoBERT tutorial.
@jamiekang jamiekang requested a review from a team as a code owner May 12, 2020 09:00
* Update Jenkinsfile_py3_cpu_unittest
* Update Jenkinsfile_py3-master_cpu_unittest
Copy link
Member

chenw23 commented May 13, 2020

Hello Jiyang, would you please merge the latest commit to your branch for this pull request?
#1229 This fixes a cpu-unittest timing restriction which is preventing your commit from being built.
Thanks!

Copy link
Contributor Author

Hello Jiyang, would you please merge the latest commit to your branch for this pull request?
#1229 This fixes a cpu-unittest timing restriction which is preventing your commit from being built.
Thanks!

Hello, my branch is v0.9.x and I made the lastest commit to that branch. I don't have any other branches. Can you tell me which steps are more required? Thanks.

Copy link
Member

chenw23 commented May 14, 2020

Hello Jiyang, would you please merge the latest commit to your branch for this pull request?
#1229 This fixes a cpu-unittest timing restriction which is preventing your commit from being built.
Thanks!

Hello, my branch is v0.9.x and I made the lastest commit to that branch. I don't have any other branches. Can you tell me which steps are more required? Thanks.

Hello, this commit is merged into v0.9.x branch yesterday and it seems that your pull request is opened 2 days ago. So maybe your pull request is not including this commit?

Copy link
Member

chenw23 commented May 14, 2020

Sorry but I wonder whether there is actual need for merging into v0.9.x(release branch) rather than the master(develop branch)?
I am noticing the gpu-doc failures. On master branch there are some new features that might improve the stability of doc build and help us debugging errors.

Copy link
Member

chenw23 commented May 14, 2020

Hello, I think you need to change the pull request target branch to dmlc:master. Currently you are still targeting dmlc:v0.9.x
Thanks!

@jamiekang jamiekang changed the base branch from v0.9.x to master May 14, 2020 04:17
logGroupName = '/aws/batch/job'

jobName = re.sub('[^A-Za-z0-9_\-]', '', args.name)[:128] # Enforce AWS Batch jobName rules
jobName = re.sub(r'[^A-Za-z0-9_\-]', '', args.name)[:128] # Enforce AWS Batch jobName rules
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,
Did you type this character by mistake?

Copy link
Contributor Author

@jamiekang jamiekang May 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know anything about this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked and found out this is from a change in v0.9.x branch but not in master branch. #1219
It's good and please keep it!
Thanks!

Copy link
Member

chenw23 commented May 14, 2020

Hello Jiyang,
One of the test is failing due to unclear errors. Please wait patiently while we are working on the fixes.

@leezu This gpu doc test cannot pass doctest. But it seems that this error is due to a connection error. Maybe we need to do some changes elsewhere?

Copy link
Contributor Author

any update?

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test failed due to some external dependency. Let me trigger it again and see if it passes

Copy link
Member

chenw23 commented May 30, 2020

any update?

Hello Jiyang, would you please merge the latest master branch into your pull request, especially to include #1236 ? So that gpu-doc can pass.
Thanks!

Copy link
Contributor Author

jamiekang commented Jun 2, 2020
edited
Loading

Is this okay?

Merge pull request #1 from dmlc/master ... 19c4045

Copy link
Member

Is this okay?

Merge pull request #1 from dmlc/master ... 19c4045

yes

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leezu any idea why the err log is missing?

fatal error: An error occurred (404) when calling the HeadObject operation: Key "batch/PR-1230/14/docs/examples/sentiment_analysis/kobert_naver_movie.stderr.log" does not exist

Copy link
Member

chenw23 commented Jun 5, 2020

@leezu any idea why the err log is missing?

fatal error: An error occurred (404) when calling the HeadObject operation: Key "batch/PR-1230/14/docs/examples/sentiment_analysis/kobert_naver_movie.stderr.log" does not exist

I think this is because the ci/batch/submit-job.py failed.
This failure is due to the failure of ci/batch/docker/gluon_nlp_job.sh
The failure above is due to the failure of docs/md2ipynb.py

So the root cause of this failure is that the conversion of the newly added md file to the ipynb file didn't succeed.

Copy link
Contributor

leezu commented Sep 9, 2020

@jamiekang The failure now is in the next and final step of the build process. The rm command helped to get to the final step.
Currently the CI fails due to a sphinx warning.

[2020年09月09日T06:57:48.867Z] /var/lib/jenkins/gluon-nlp-cpu-py3-master/docs/examples/sentiment_analysis/kobert_naver_movie.ipynb:Could not lex literal_block as "python". Highlighting skipped.

To reproduce locally, you should be able to run MD2IPYNB_OPTION=--disable_compute make docs_local on your computer

Copy link
Member

szha commented Sep 9, 2020

adding this line generated new error: !rm -rf dataset_folder

should I keep this line or remove it?

You need to update the folder name with the actual dataset path.

Copy link
Contributor Author

adding this line generated new error: !rm -rf dataset_folder

should I keep this line or remove it?

You need to update the folder name with the actual dataset path.

ok, I will fix it. Any idea to overcome timeout?

Copy link
Contributor Author

Could not lex literal_block as "python". Highlighting skipped.

Copy link
Contributor Author

kernel.cu(1084): Error: Formal parameter space overflowed (4648 bytes required, max 4096 bytes allowed) in function

Copy link
Contributor Author

What does this mean?
Reshape_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_kernel.cu(1084): Error: Formal parameter space overflowed (4648 bytes required, max 4096 bytes allowed) in function

Copy link
Member

szha commented Sep 24, 2020

cc @ptrendx, looks like a failed fusion case.

Copy link
Contributor

ptrendx commented Sep 24, 2020

Hmmm, yeeeaah... So, just out of curiosity - why do you call there expand_dims over a 100 times on a single thing?

Copy link
Contributor Author

Hmmm, yeeeaah... So, just out of curiosity - why do you call there expand_dims over a 100 times on a single thing?

There's no explicit call to expand_dims() in the source (.md or .ipynb).

Copy link
Contributor

ptrendx commented Sep 25, 2020

I don't think this error comes from your PR - it happens in test_xlnet_finetune_glue[MRPC].

Copy link
Contributor Author

Any update?

Copy link
Member

szha commented Nov 9, 2020

here's a summary of the blocking issues:

  • (resolved) fusion RTC bug for expand_dims (thanks @ptrendx)
  • (resolved) conda dependency resolution failure (I upgraded conda and its python versions on all workers)
  • (ongoing) horovod installation issue

Once the master-gpu-doc pipeline passes I will merge this PR first and we can unblock the horovod issue separately.

Copy link
Contributor Author

here's a summary of the blocking issues:

  • (resolved) fusion RTC bug for expand_dims (thanks @ptrendx)
  • (resolved) conda dependency resolution failure (I upgraded conda and its python versions on all workers)
  • (ongoing) horovod installation issue

Once the master-gpu-doc pipeline passes I will merge this PR first and we can unblock the horovod issue separately.

Thanks. Let's see how the master-gpu-doc pipeline works.

Copy link
Member

szha commented Nov 9, 2020

Looks like there is still some error in the new notebook that needs to be resolved first:

[2020年11月09日T22:10:54.320Z] Warning, treated as error:
[2020年11月09日T22:10:54.320Z] /var/lib/jenkins/gluon-nlp-cpu-py3-master/docs/examples/sentiment_analysis/kobert_naver_movie.ipynb:Could not lex literal_block as "python". Highlighting skipped.

Checking what's causing it.

```

```{.python .input}
!rm -rf nsmc # clean up
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is what's troubling the python lexer. Trying to do the same inside python

Copy link
Contributor Author

It seems we still have the multiple expand_dims error.

Copy link
Member

I'll later also take a look about how to port KoBERT + Tutorial to the master version.

Copy link
Contributor Author

I'll later also take a look about how to port KoBERT + Tutorial to the master version.

Thanks!

Copy link
Contributor Author

/var/lib/jenkins/workspace/gluon-nlp-cpu-py3/conda/cpu/py3/lib/python3.5/site-packages/mxnet/include/mxnet/ndarray.h:41:10: fatal error: mkldnn.hpp: No such file or directory
Anyone can help this? Thanks in advance.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Reviewers

3 more reviewers

@szha szha szha left review comments

@eric-haibin-lin eric-haibin-lin eric-haibin-lin left review comments

@chenw23 chenw23 chenw23 requested changes

Reviewers whose approvals may not affect merge requirements

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /