Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

java implementation of Bert Tokenizer, support output onnx tensor for onnx model inference

Notifications You must be signed in to change notification settings

jadepeng/bertTokenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

1 Commit

Repository files navigation

Bert Tokenizer

This repository contains java implementation of Bert Tokenizer. The implementation is referred from https://github.com/ankiteciitkgp/bertTokenizer

Support output onnx tensor for onnx model inference

Usage

To get tokens from text:

String text = "Text to tokenize";
 BertTokenizer bertTokenizer = new BertTokenizer("D:\\model\\vocab.txt");
List<String> tokens = tokenizer.tokenize(text);

To get token ids using Bert Vocab:

List<Integer> token_ids = tokenizer.convert_tokens_to_ids(tokens);

To get :

List<Integer> token_ids = tokenizer.convert_tokens_to_ids(tokens);

To get onnx tensor

var inputMap = bertTokenizer.tokenizeOnnxTensor(Arrays.asList("hello world 你好", "肿瘤治疗未来发展趋势"));

Full example:

public class OnnxTests {
 public static void main(String[] args) throws IOException, OrtException {
 BertTokenizer bertTokenizer = new BertTokenizer("D:\\model\\vocab.txt");
 var env = OrtEnvironment.getEnvironment();
 var session = env.createSession("D:\\model\\output\\onnx\\fp16_model.onnx",
 new OrtSession.SessionOptions());
 var inputMap = bertTokenizer.tokenizeOnnxTensor(Arrays.asList("hello world 你好", "肿瘤治疗未来发展趋势"));
 try (var results = session.run(inputMap)) {
 System.out.println(results);
 var embeddings = (float[][])results.get(0).getValue();
 for (var embedding : embeddings) {
 System.out.println(JSON.toJSONString(embedding));
 }
 }
 }
}

About

java implementation of Bert Tokenizer, support output onnx tensor for onnx model inference

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

AltStyle によって変換されたページ (->オリジナル) /