Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

30系列显卡,CUDA11环境,PaddleNLP加载模型报错 #373

Answered by BlueP0int
BlueP0int asked this question in Q&A
Discussion options

OS:Ubuntu 20.04
CUDA:11.1
Driver Version: 455.23.04
GPU Compute Capability: 8.6
Driver API Version: 11.1
Runtime API Version: 10.2
cuDNN Version: 7.6.
PaddlePaddle:2.0.2和2.0.1都试过,问题一样

在『2021语言与智能技术竞赛』- 事件抽取任务基线系统 的基线NoteBook中,下载代码到自己机器上运行,运行起来特别慢,而且执行代码时报错:parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device. 有没有大神解释一下什么原因

from paddlenlp.transformers import ErnieForTokenClassification, ErnieForSequenceClassification
from utils import load_dict
label_map = load_dict('./conf/DuEE-Fin/trigger_tag.dict')
id2label = {val: key for key, val in label_map.items()}
print(id2label)
# model = ErnieForTokenClassification.from_pretrained("ernie-1.0", num_classes=len(label_map))
# from paddlenlp.transformers import ErnieForSequenceClassification
model = ErnieForSequenceClassification.from_pretrained("ernie-1.0", num_classes=len(label_map))

line13 为最后一行,完整报错:
W0509 13:09:59.510846 14922 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.1, Runtime API Version: 10.2 W0509 13:09:59.515894 14922 device_context.cc:372] device: 0, cuDNN Version: 7.6. Traceback (most recent call last): File "dueStep2.py", line 13, in model = ErnieForSequenceClassification.from_pretrained("ernie-1.0", num_classes=len(label_map)) File "/home/XXX/miniconda3/lib/python3.8/site-packages/paddlenlp/transformers/model_utils.py", line 229, in from_pretrained if k in base_parameters_dict: File "/home/XXX/miniconda3/lib/python3.8/site-packages/paddlenlp/transformers/utils.py", line 83, in __impl__ init_func(self, *args, **kwargs) File "/home/XXX/miniconda3/lib/python3.8/site-packages/paddlenlp/transformers/ernie/modeling.py", line 203, in __init__ self.pad_token_id = pad_token_id File "/home/XXX/miniconda3/lib/python3.8/site-packages/paddlenlp/transformers/ernie/modeling.py", line 41, in __init__ self.word_embeddings = nn.Embedding( File "/home/XXX/miniconda3/lib/python3.8/site-packages/paddle/nn/layer/common.py", line 1348, in __init__ self.weight = self.create_parameter( File "/home/XXX/miniconda3/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 407, in create_parameter return self._helper.create_parameter(temp_attr, shape, dtype, is_bias, File "/home/XXX/miniconda3/lib/python3.8/site-packages/paddle/fluid/layer_helper_base.py", line 367, in create_parameter return self.main_program.global_block().create_parameter( File "/home/XXX/miniconda3/lib/python3.8/site-packages/paddle/fluid/framework.py", line 2988, in create_parameter initializer(param, self) File "/home/XXX/miniconda3/lib/python3.8/site-packages/paddle/fluid/initializer.py", line 557, in __call__ op = block._prepend_op( File "/home/XXX/miniconda3/lib/python3.8/site-packages/paddle/fluid/framework.py", line 3100, in _prepend_op _dygraph_tracer().trace_op(type, File "/home/XXX/miniconda3/lib/python3.8/site-packages/paddle/fluid/dygraph/tracer.py", line 43, in trace_op self.trace(type, inputs, outputs, attrs, SystemError: (Fatal) Operator uniform_random raises an thrust::system::system_error exception. The exception content is :parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device. (at /paddle/paddle/fluid/imperative/tracer.cc:172)

You must be logged in to vote

重新尝试了以下两种配置,终于可以用了!个人猜测问题核心在于安装paddlepaddle-gpu==2.0.2.post110,版本号post110不能少,之前用的清华源,找不到该版本,于是就改为paddlepaddle-gpu==2.0.2,结果运行不起来。现有配置可能存在多显卡并行不支持的问题(使用 python 或 python3 进入python解释器,输入import paddle ,再输入 paddle.utils.run_check() ,如果出现PaddlePaddle is installed successfully!,说明您已成功安装。),但单块卡总算是能跑通了,感谢各位的耐心支持!
以下方案二选一即可,其他安装请参考https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/conda/linux-conda.html
方案一:

conda create -n pd python=3.7 -y
conda activate pd
conda install cudatoolkit=11.0 -y
python -m pip install paddlepaddle-gpu==2.0.2.post110 -f https://paddlepaddle.org.cn/whl/mkl/stable.html
pip install --upgrade paddlenlp -i https://pypi.org/simple

方案二:
将以下代码保存为e...

Replies: 10 comments

Comment options

具体的原因没有看出来,麻烦能不能提供一下您的具体的GPU设备的信息,我们来参考设备信息来判断一下

You must be logged in to vote
0 replies
Comment options

GPU 为NVIDIA GeForce RTX 3090,24G显存

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.04 Driver Version: 455.23.04 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3090 Off | 00000000:1A:00.0 Off | N/A |
| 49% 54C P2 223W / 350W | 3823MiB / 24268MiB | 60% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 3090 Off | 00000000:1B:00.0 Off | N/A |
| 30% 26C P8 8W / 350W | 2MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
You must be logged in to vote
0 replies
Comment options

看起来是CUDA11驱动导致的,这里有个类似的问题 #262 ,也可以先用conda尝试安装一下相应的paddle版本

You must be logged in to vote
0 replies
Comment options

好的,我试试直接安装cuda10.2对应的paddle

You must be logged in to vote
0 replies
Comment options

conda install paddlepaddle-gpu==2.0.2 cudatoolkit=10.2 -c paddle
直接这样改问题依旧,我没有直接改动cuda版本的权限,CUDA Version: 11.1

You must be logged in to vote
0 replies
Comment options

conda install paddlepaddle-gpu==2.0.2 cudatoolkit=11.0
使用cuda11吧,30系显卡不支持cuda10,最低要求11

You must be logged in to vote
0 replies
Comment options

改为conda install paddlepaddle-gpu==2.0.2 cudatoolkit=11.0 依然不行,运行import paddle ,再输入 paddle.utils.run_check() 报同样的错误。
RuntimeError: parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device

You must be logged in to vote
0 replies
Comment options

重新尝试了以下两种配置,终于可以用了!个人猜测问题核心在于安装paddlepaddle-gpu==2.0.2.post110,版本号post110不能少,之前用的清华源,找不到该版本,于是就改为paddlepaddle-gpu==2.0.2,结果运行不起来。现有配置可能存在多显卡并行不支持的问题(使用 python 或 python3 进入python解释器,输入import paddle ,再输入 paddle.utils.run_check() ,如果出现PaddlePaddle is installed successfully!,说明您已成功安装。),但单块卡总算是能跑通了,感谢各位的耐心支持!
以下方案二选一即可,其他安装请参考https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/conda/linux-conda.html
方案一:

conda create -n pd python=3.7 -y
conda activate pd
conda install cudatoolkit=11.0 -y
python -m pip install paddlepaddle-gpu==2.0.2.post110 -f https://paddlepaddle.org.cn/whl/mkl/stable.html
pip install --upgrade paddlenlp -i https://pypi.org/simple

方案二:
将以下代码保存为env.yaml, 注意把最后一行prefix改为你自己的condaminicondapaddle的对应路径,然后运行 conda env create -f env.yaml

name: paddle
channels:
 - defaults
dependencies:
 - _libgcc_mutex=0.1=main
 - backcall=0.2.0=pyhd3eb1b0_0
 - blas=1.0=mkl
 - ca-certificates=2021年4月13日=h06a4308_1
 - certifi=2020年12月5日=py37h06a4308_0
 - cudatoolkit=11.0.221=h6bb024c_0
 - intel-openmp=202120=h06a4308_610
 - ipykernel=5.3.4=py37h5ca1d4c_0
 - ipython=7.22.0=py37hb070fc8_0
 - ipython_genutils=0.2.0=pyhd3eb1b0_1
 - jedi=0.17.0=py37_0
 - jupyter_client=6.1.12=pyhd3eb1b0_0
 - jupyter_core=4.7.1=py37h06a4308_0
 - ld_impl_linux-64=2.33.1=h53a641e_7
 - libffi=3.3=he6710b0_2
 - libgcc-ng=9.1.0=hdf63c60_0
 - libsodium=1.0.18=h7b6447c_0
 - libstdcxx-ng=9.1.0=hdf63c60_0
 - mkl=202120=h06a4308_296
 - mkl-service=2.3.0=py37h27cfd23_1
 - mkl_fft=1.3.0=py37h42c9631_2
 - mkl_random=1.2.1=py37ha9443f7_2
 - ncurses=6.2=he6710b0_1
 - numpy-base=1.20.1=py37h7d8b39e_0
 - openssl=1.1.1k=h27cfd23_0
 - parso=0.8.2=pyhd3eb1b0_0
 - pexpect=4.8.0=pyhd3eb1b0_3
 - pickleshare=0.7.5=pyhd3eb1b0_1003
 - pip=21.0.1=py37h06a4308_0
 - prompt-toolkit=3.0.17=pyh06a4308_0
 - ptyprocess=0.7.0=pyhd3eb1b0_2
 - pygments=2.8.1=pyhd3eb1b0_0
 - python=3.7.10=hdb3f193_0
 - python-dateutil=2.8.1=pyhd3eb1b0_0
 - pyzmq=20.0.0=py37h2531618_1
 - readline=8.1=h27cfd23_0
 - setuptools=52.0.0=py37h06a4308_0
 - sqlite=3.35.4=hdfb4753_0
 - tk=8.6.10=hbc83047_0
 - tornado=6.1=py37h27cfd23_0
 - tqdm=4.59.0=pyhd3eb1b0_1
 - traitlets=5.0.5=pyhd3eb1b0_0
 - wcwidth=0.2.5=py_0
 - wheel=0.36.2=pyhd3eb1b0_0
 - xz=5.2.5=h7b6447c_0
 - zeromq=4.3.4=h2531618_0
 - zlib=1.2.11=h7b6447c_3
 - pip:
 - appdirs==1.4.4
 - astor==0.8.1
 - babel==2.9.1
 - bce-python-sdk==0.8.60
 - cached-property==1.5.2
 - cfgv==3.2.0
 - chardet==4.0.0
 - click==7.1.2
 - colorama==0.4.4
 - colorlog==5.0.1
 - decorator==5.0.7
 - dill==0.3.3
 - distlib==0.3.1
 - filelock==3.0.12
 - flake8==3.9.2
 - flask==1.1.2
 - flask-babel==2.0.0
 - future==0.18.2
 - gast==0.4.0
 - h5py==3.2.1
 - identify==2.2.4
 - idna==2.10
 - importlib-metadata==4.0.1
 - itsdangerous==1.1.0
 - jieba==0.42.1
 - jinja2==2.11.3
 - joblib==1.0.1
 - markupsafe==1.1.1
 - mccabe==0.6.1
 - multiprocess==0.70.11.1
 - nodeenv==1.6.0
 - numpy==1.20.2
 - paddlenlp==2.0.0rc21
 - paddlepaddle-gpu==2.0.2.post110
 - pillow==8.2.0
 - pre-commit==2.12.1
 - protobuf==3.16.0
 - pycodestyle==2.7.0
 - pycryptodome==3.10.1
 - pyflakes==2.3.1
 - pytz==2021.1
 - pyyaml==5.4.1
 - requests==2.25.1
 - scikit-learn==0.24.2
 - scipy==1.6.3
 - seqeval==1.2.2
 - shellcheck-py==0.7.2.1
 - six==1.16.0
 - threadpoolctl==2.1.0
 - toml==0.10.2
 - typing-extensions==3.10.0.0
 - urllib3==1.26.4
 - virtualenv==20.4.6
 - visualdl==2.1.1
 - werkzeug==1.0.1
 - zipp==3.4.1
prefix: /home/XXX/miniconda3/envs/paddle
You must be logged in to vote
0 replies
Answer selected by ZeyuChen
Comment options

清华源 需要在配置文件中添加Paddle通道,而且必须是大写的P(官方通道是小写的p).

You must be logged in to vote
0 replies
Comment options

重新尝试了以下两种配置,终于可以用了!个人猜测问题核心在于安装paddlepaddle-gpu==2.0.2.post110,版本号post110不能少,之前用的清华源,找不到该版本,于是就改为paddlepaddle-gpu==2.0.2,结果运行不起来。现有配置可能存在多显卡并行不支持的问题(使用 python 或 python3 进入python解释器,输入import paddle ,再输入 paddle.utils.run_check() ,如果出现PaddlePaddle is installed successfully!,说明您已成功安装。),但单块卡总算是能跑通了,感谢各位的耐心支持!
以下方案二选一即可,其他安装请参考https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/conda/linux-conda.html
方案一:

conda create -n pd python=3.7 -y
conda activate pd
conda install cudatoolkit=11.0 -y
python -m pip install paddlepaddle-gpu==2.0.2.post110 -f https://paddlepaddle.org.cn/whl/mkl/stable.html
pip install --upgrade paddlenlp -i https://pypi.org/simple

方案二:
将以下代码保存为env.yaml, 注意把最后一行prefix改为你自己的condaminicondapaddle的对应路径,然后运行 conda env create -f env.yaml

name: paddle
channels:
 - defaults
dependencies:
 - _libgcc_mutex=0.1=main
 - backcall=0.2.0=pyhd3eb1b0_0
 - blas=1.0=mkl
 - ca-certificates=2021年4月13日=h06a4308_1
 - certifi=2020年12月5日=py37h06a4308_0
 - cudatoolkit=11.0.221=h6bb024c_0
 - intel-openmp=202120=h06a4308_610
 - ipykernel=5.3.4=py37h5ca1d4c_0
 - ipython=7.22.0=py37hb070fc8_0
 - ipython_genutils=0.2.0=pyhd3eb1b0_1
 - jedi=0.17.0=py37_0
 - jupyter_client=6.1.12=pyhd3eb1b0_0
 - jupyter_core=4.7.1=py37h06a4308_0
 - ld_impl_linux-64=2.33.1=h53a641e_7
 - libffi=3.3=he6710b0_2
 - libgcc-ng=9.1.0=hdf63c60_0
 - libsodium=1.0.18=h7b6447c_0
 - libstdcxx-ng=9.1.0=hdf63c60_0
 - mkl=202120=h06a4308_296
 - mkl-service=2.3.0=py37h27cfd23_1
 - mkl_fft=1.3.0=py37h42c9631_2
 - mkl_random=1.2.1=py37ha9443f7_2
 - ncurses=6.2=he6710b0_1
 - numpy-base=1.20.1=py37h7d8b39e_0
 - openssl=1.1.1k=h27cfd23_0
 - parso=0.8.2=pyhd3eb1b0_0
 - pexpect=4.8.0=pyhd3eb1b0_3
 - pickleshare=0.7.5=pyhd3eb1b0_1003
 - pip=21.0.1=py37h06a4308_0
 - prompt-toolkit=3.0.17=pyh06a4308_0
 - ptyprocess=0.7.0=pyhd3eb1b0_2
 - pygments=2.8.1=pyhd3eb1b0_0
 - python=3.7.10=hdb3f193_0
 - python-dateutil=2.8.1=pyhd3eb1b0_0
 - pyzmq=20.0.0=py37h2531618_1
 - readline=8.1=h27cfd23_0
 - setuptools=52.0.0=py37h06a4308_0
 - sqlite=3.35.4=hdfb4753_0
 - tk=8.6.10=hbc83047_0
 - tornado=6.1=py37h27cfd23_0
 - tqdm=4.59.0=pyhd3eb1b0_1
 - traitlets=5.0.5=pyhd3eb1b0_0
 - wcwidth=0.2.5=py_0
 - wheel=0.36.2=pyhd3eb1b0_0
 - xz=5.2.5=h7b6447c_0
 - zeromq=4.3.4=h2531618_0
 - zlib=1.2.11=h7b6447c_3
 - pip:
 - appdirs==1.4.4
 - astor==0.8.1
 - babel==2.9.1
 - bce-python-sdk==0.8.60
 - cached-property==1.5.2
 - cfgv==3.2.0
 - chardet==4.0.0
 - click==7.1.2
 - colorama==0.4.4
 - colorlog==5.0.1
 - decorator==5.0.7
 - dill==0.3.3
 - distlib==0.3.1
 - filelock==3.0.12
 - flake8==3.9.2
 - flask==1.1.2
 - flask-babel==2.0.0
 - future==0.18.2
 - gast==0.4.0
 - h5py==3.2.1
 - identify==2.2.4
 - idna==2.10
 - importlib-metadata==4.0.1
 - itsdangerous==1.1.0
 - jieba==0.42.1
 - jinja2==2.11.3
 - joblib==1.0.1
 - markupsafe==1.1.1
 - mccabe==0.6.1
 - multiprocess==0.70.11.1
 - nodeenv==1.6.0
 - numpy==1.20.2
 - paddlenlp==2.0.0rc21
 - paddlepaddle-gpu==2.0.2.post110
 - pillow==8.2.0
 - pre-commit==2.12.1
 - protobuf==3.16.0
 - pycodestyle==2.7.0
 - pycryptodome==3.10.1
 - pyflakes==2.3.1
 - pytz==2021.1
 - pyyaml==5.4.1
 - requests==2.25.1
 - scikit-learn==0.24.2
 - scipy==1.6.3
 - seqeval==1.2.2
 - shellcheck-py==0.7.2.1
 - six==1.16.0
 - threadpoolctl==2.1.0
 - toml==0.10.2
 - typing-extensions==3.10.0.0
 - urllib3==1.26.4
 - virtualenv==20.4.6
 - visualdl==2.1.1
 - werkzeug==1.0.1
 - zipp==3.4.1
prefix: /home/XXX/miniconda3/envs/paddle

感谢您在该问题上输出的宝贵经验,同时我们会考虑在文档上增加FAQ机制,减少后续的使用难度。

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
Converted from issue

This discussion was converted from issue #348 on May 12, 2021 11:06.

AltStyle によって変換されたページ (->オリジナル) /