-
Notifications
You must be signed in to change notification settings - Fork 11.1k
-
tokenizer = AutoTokenizer.from_pretrained("uer/gpt2-chinese-cluecorpussmall", cache_dir="./gpt2_ch")
debug information shows <class 'transformers.models.bert.tokenization_bert.BertTokenizer'>
model = GPT2LMHeadModel.from_pretrained("uer/gpt2-chinese-cluecorpussmall", cache_dir="./gpt2_ch")
is this a mismatch, how to solve it? since GT2Tokenizer has no "uer/gpt2-chinese-cluecorpussmall"
The response of the trained model contains special tokens like [CLS]
and [SEP]
, just remove them would work?
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment