This project is to convert ERNIE to huggingface's format.
ERNIE is based on the Bert model and has better performance on Chinese NLP tasks.
Update: We have supported ernie2.0 (base & large) and ernie-tiny
You can directly download the model I have converted or directly load by huggingface's transformers or convert by yourself.
| model | identifier in transformers | description | download url |
|---|---|---|---|
| ernie-1.0 (Chinese) | nghuyong/ernie-1.0 | Layer:12, Hidden:768, Heads:12 | http://pan.nghuyong.top/#/s/y7Uz |
| ernie-2.0-en (English) | nghuyong/ernie-2.0-en | Layer:12, Hidden:768, Heads:12 | http://pan.nghuyong.top/#/s/BXh9 |
| ernie-2.0-large-en (English) | nghuyong/ernie-2.0-large-en | Layer:24, Hidden:1024, Heads16 | http://pan.nghuyong.top/#/s/DxiK |
| ernie-tiny (English) | nghuyong/ernie-tiny | Layer:3, Hdden:1024, Heads:16 | http://pan.nghuyong.top/#/s/AOf3 |
Directly Load by huggingface's transformers, take ernie-1.0 as an example:
from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("nghuyong/ernie-1.0") model = AutoModel.from_pretrained("nghuyong/ernie-1.0")
-
Download the paddle-paddle version ERNIE model from here, move to this project path and unzip the file.
-
pip install -r requirements.txt -
python convert.py
====================save config file====================
====================save vocab file====================
====================extract weights====================
mlm_bias -> cls.predictions.bias (18000,)
ln.weight -> bert.embeddings.LayerNorm.gamma (768,)
ln.bias -> bert.embeddings.LayerNorm.beta (768,)
word_emb.weight -> bert.embeddings.word_embeddings.weight (18000, 768)
pos_emb.weight -> bert.embeddings.position_embeddings.weight (513, 768)
sent_emb.weight -> bert.embeddings.token_type_embeddings.weight (2, 768)
encoder_stack.block.0.attn.q.weight -> bert.encoder.layer.0.attention.self.query.weight (768, 768)
encoder_stack.block.0.attn.q.bias -> bert.encoder.layer.0.attention.self.query.bias (768,)
encoder_stack.block.0.attn.k.weight -> bert.encoder.layer.0.attention.self.key.weight (768, 768)
encoder_stack.block.0.attn.k.bias -> bert.encoder.layer.0.attention.self.key.bias (768,)
encoder_stack.block.0.attn.v.weight -> bert.encoder.layer.0.attention.self.value.weight (768, 768)
encoder_stack.block.0.attn.v.bias -> bert.encoder.layer.0.attention.self.value.bias (768,)
encoder_stack.block.0.attn.o.weight -> bert.encoder.layer.0.attention.output.dense.weight (768, 768)
encoder_stack.block.0.attn.o.bias -> bert.encoder.layer.0.attention.output.dense.bias (768,)
encoder_stack.block.0.ln1.weight -> bert.encoder.layer.0.attention.output.LayerNorm.gamma (768,)
encoder_stack.block.0.ln1.bias -> bert.encoder.layer.0.attention.output.LayerNorm.beta (768,)
encoder_stack.block.0.ffn.i.weight -> bert.encoder.layer.0.intermediate.dense.weight (3072, 768)
encoder_stack.block.0.ffn.i.bias -> bert.encoder.layer.0.intermediate.dense.bias (3072,)
encoder_stack.block.0.ffn.o.weight -> bert.encoder.layer.0.output.dense.weight (768, 3072)
encoder_stack.block.0.ffn.o.bias -> bert.encoder.layer.0.output.dense.bias (768,)
encoder_stack.block.0.ln2.weight -> bert.encoder.layer.0.output.LayerNorm.gamma (768,)
encoder_stack.block.0.ln2.bias -> bert.encoder.layer.0.output.LayerNorm.beta (768,)
...
encoder_stack.block.11.ffn.o.bias -> bert.encoder.layer.11.output.dense.bias (768,)
encoder_stack.block.11.ln2.weight -> bert.encoder.layer.11.output.LayerNorm.gamma (768,)
encoder_stack.block.11.ln2.bias -> bert.encoder.layer.11.output.LayerNorm.beta (768,)
pooler.weight -> bert.pooler.dense.weight (768, 768)
pooler.bias -> bert.pooler.dense.bias (768,)
mlm.weight -> cls.predictions.transform.dense.weight (768, 768)
mlm.bias -> cls.predictions.transform.dense.bias (768,)
mlm_ln.weight -> cls.predictions.transform.LayerNorm.gamma (768,)
mlm_ln.bias -> cls.predictions.transform.LayerNorm.beta (768,)
Now, a folder named convert will be in the project path, and
there will be three files in this folder: config.json,pytorch_model.bin and vocab.txt.
PaddlePaddle's Official Quick Start
#!/usr/bin/env python # encoding: utf-8 import numpy as np import paddle.fluid.dygraph as D from ernie.tokenizing_ernie import ErnieTokenizer from ernie.modeling_ernie import ErnieModel D.guard().__enter__() # activate paddle `dygrpah` mode model = ErnieModel.from_pretrained('ernie-1.0') # Try to get pretrained model from server, make sure you have network connection model.eval() tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0') ids, _ = tokenizer.encode('hello world') ids = D.to_variable(np.expand_dims(ids, 0)) # insert extra `batch` dimension pooled, encoded = model(ids) # eager execution print(pooled.numpy()) # convert results to numpy """ output: [[-1. -1. 0.99479663 -0.99986964 -0.7872066 -1. -0.99919444 0.985997 -0.22648102 0.97202295 -0.9994965 -0.982234 -0.6821966 -0.9998574 -0.83046496 -0.9804977 -1. 0.9999509 -0.55144966 0.48973152 -1. 1. 0.14248642 -0.71969527 ... 0.93848914 0.8418771 1. 0.99999803 0.9800671 0.99886674 0.9999988 0.99946415 0.9849099 0.9996924 -0.79442227 -0.9999412 0.99827075 1. -0.05767363 0.99999857 0.8176171 0.7983498 -0.14292054 1. -0.99759513 -0.9999982 -0.99973375 -0.9993742 ]] """
Use huggingface's Transformer with our converted ERNIE model
import torch from transformers import BertTokenizer, BertModel tokenizer = BertTokenizer.from_pretrained('./convert') model = BertModel.from_pretrained('./convert') input_ids = torch.tensor([tokenizer.encode("hello world", add_special_tokens=True)]) with torch.no_grad(): sequence_output, pooled_output = model(input_ids) print(pooled_output.cpu().numpy()) """ output: [[-1. -1. 0.99479663 -0.99986964 -0.78720796 -1. -0.9991946 0.98599714 -0.22648017 0.972023 -0.9994966 -0.9822342 -0.682196 -0.9998575 -0.83046496 -0.9804982 -1. 0.99995095 -0.551451 0.48973027 -1. 1. 0.14248991 -0.71969616 ... 0.9384899 0.84187615 1. 0.999998 0.9800671 0.99886674 0.9999988 0.99946433 0.98491037 0.9996923 -0.7944245 -0.99994105 0.9982707 1. -0.05766615 0.9999987 0.81761867 0.7983511 -0.14292456 1. -0.9975951 -0.9999982 -0.9997338 -0.99937415]] """
It can be seen that the encoder result of our convert version is the same with the official paddlepaddle's version.
Here, we just take ernie1.0 as an example, ernie-tiny, ernie-2.0-en and ernie-2.0-large-en will get the same result.
We use BertForMaskedLM from transformers to reproduce the Cloze Test in
ERNIE's paper (section 4.6).
We also compare ERNIE's result with google's Chinese-BERT, bert-wwm and bert-wwm-ext from Chinese-BERT-wwm.
Code
#!/usr/bin/env python #encoding: utf-8 import torch from transformers import BertTokenizer, BertForMaskedLM tokenizer = BertTokenizer.from_pretrained('./convert') input_tx = "[CLS] [MASK] [MASK] [MASK] 是中国神魔小说的经典之作,与《三国演义》《水浒传》《红楼梦》并称为中国古典四大名著。[SEP]" tokenized_text = tokenizer.tokenize(input_tx) indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text) tokens_tensor = torch.tensor([indexed_tokens]) segments_tensors = torch.tensor([[0] * len(tokenized_text)]) model = BertForMaskedLM.from_pretrained('./convert') model.eval() with torch.no_grad(): outputs = model(tokens_tensor, token_type_ids=segments_tensors) predictions = outputs[0] predicted_index = [torch.argmax(predictions[0, i]).item() for i in range(0, (len(tokenized_text) - 1))] predicted_token = [tokenizer.convert_ids_to_tokens([predicted_index[x]])[0] for x in range(1, (len(tokenized_text) - 1))] print('Predicted token is:', predicted_token)
Result
input:
[CLS] [MASK] [MASK] [MASK] 是中国神魔小说的经典之作,与《三国演义》《水浒传》《红楼梦》并称为中国古典四大名著。[SEP]
output:
{
"bert-base": "《 神 》",
"bert-wwm": "天 神 奇",
"bert-wwm-ext": "西 游 记",
"ernie-1.0": "西 游 记"
}
We can simply use huggingface's convert_pytorch_checkpoint_to_tf tool to convert huggingface's pytorch model to tensorflow's version.
from transformers import BertModel from transformers.convert_bert_pytorch_checkpoint_to_original_tf import convert_pytorch_checkpoint_to_tf model = BertModel.from_pretrained('./convert') convert_pytorch_checkpoint_to_tf(model=model, ckpt_dir='./tf_convert', model_name='ernie')
Output
I0715 09:15:37.493660 4524387776 configuration_utils.py:262] loading configuration file ./convert/config.json
I0715 09:15:37.494213 4524387776 configuration_utils.py:300] Model config BertConfig {
"attention_probs_dropout_prob": 0.1,
"gradient_checkpointing": false,
"hidden_act": "relu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 513,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"type_vocab_size": 2,
"vocab_size": 18000
}
I0715 09:15:37.495160 4524387776 modeling_utils.py:665] loading weights file ./convert/pytorch_model.bin
I0715 09:15:39.599742 4524387776 modeling_utils.py:765] All model checkpoint weights were used when initializing BertModel.
I0715 09:15:39.599884 4524387776 modeling_utils.py:774] All the weights of BertModel were initialized from the model checkpoint at ./convert.
If your task is similar to the task the model of the ckeckpoint was trained on, you can already use BertModel for predictions without further training.
2020年07月15日 09:15:39.613287: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Successfully created bert/embeddings/word_embeddings: True
Successfully created bert/embeddings/position_embeddings: True
Successfully created bert/embeddings/token_type_embeddings: True
Successfully created bert/embeddings/LayerNorm/gamma: True
Successfully created bert/embeddings/LayerNorm/beta: True
Successfully created bert/encoder/layer_0/attention/self/query/kernel: True
Successfully created bert/encoder/layer_0/attention/self/query/bias: True
Successfully created bert/encoder/layer_0/attention/self/key/kernel: True
Successfully created bert/encoder/layer_0/attention/self/key/bias: True
Successfully created bert/encoder/layer_0/attention/self/value/kernel: True
Successfully created bert/encoder/layer_0/attention/self/value/bias: True
Successfully created bert/encoder/layer_0/attention/output/dense/kernel: True
Successfully created bert/encoder/layer_0/attention/output/dense/bias: True
Successfully created bert/encoder/layer_0/attention/output/LayerNorm/gamma: True
Successfully created bert/encoder/layer_0/attention/output/LayerNorm/beta: True
...
Successfully created bert/encoder/layer_11/intermediate/dense/bias: True
Successfully created bert/encoder/layer_11/output/dense/kernel: True
Successfully created bert/encoder/layer_11/output/dense/bias: True
Successfully created bert/encoder/layer_11/output/LayerNorm/gamma: True
Successfully created bert/encoder/layer_11/output/LayerNorm/beta: True
Successfully created bert/pooler/dense/kernel: True
Successfully created bert/pooler/dense/bias: TrueThe above code will generate a tf_convert directory with tensorflow's checkpoint.
└── tf_convert ├── checkpoint ├── ernie.ckpt.data-00000-of-00001 ├── ernie.ckpt.index └── ernie.ckpt.meta
The config.json and vocab.txt of tensorflow version is the same with huggingface's pytorch version in convert directory.
If you use this work in a scientific publication, I would appreciate references to the following BibTex entry:
@misc{nghuyong2019@ERNIE-Pytorch,
title={ERNIEPytorch},
author={Yong Hu},
howpublished={\url{https://github.com/nghuyong/ERNIE-Pytorch}},
year={2019}
}