Commit 4ebc201

committed

Fixing predictMaskedToken and updating README.

1 parent 3a2c48f commit 4ebc201Copy full SHA for 4ebc201

File tree

3 files changed

+55

-4

lines changed

README.md
predictMaskedToken.m
test
- tpredictMaskedToken.m

3 files changed

+55

-4

lines changed

`‎README.md‎`

Lines changed: 4 additions & 2 deletions

Original file line number	Diff line number	Diff line change
`@@ -71,12 +71,14 @@ Download or [clone](https://www.mathworks.com/help/matlab/matlab_prog/use-source`
`71`	`71`	`## Example: Classify Text Data Using BERT`
`72`	`72`	`The simplest use of a pretrained BERT model is to use it as a feature extractor. In particular, you can use the BERT model to convert documents to feature vectors which you can then use as inputs to train a deep learning classification network.`
`73`	`73`
`74`		-The example [`ClassifyTextDataUsingBERT.m`](./ClassifyTextDataUsingBERT.m) shows how to use a pretrained BERT model to classify failure events given a data set of factory reports.
	`74`	+The example [`ClassifyTextDataUsingBERT.m`](./ClassifyTextDataUsingBERT.m) shows how to use a pretrained BERT model to classify failure events given a data set of factory reports. This example requires the `factoryReports.csv` data set from the Text Analytics example (Prepare Text Data for Analysis)[https://www.mathworks.com/help/textanalytics/ug/prepare-text-data-for-analysis.html].
`75`	`75`
`76`	`76`	`## Example: Fine-Tune Pretrained BERT Model`
`77`	`77`	`To get the most out of a pretrained BERT model, you can retrain and fine tune the BERT parameters weights for your task.`
`78`	`78`
`79`		-The example [`FineTuneBERT.m`](./FineTuneBERT.m) shows how to fine-tune a pretrained BERT model to classify failure events given a data set of factory reports.
	`79`	+The example [`FineTuneBERT.m`](./FineTuneBERT.m) shows how to fine-tune a pretrained BERT model to classify failure events given a data set of factory reports. This example requires the `factoryReports.csv` data set from the Text Analytics example (Prepare Text Data for Analysis)[https://www.mathworks.com/help/textanalytics/ug/prepare-text-data-for-analysis.html].
	`80`	`+`
	`81`	+The example [`FineTuneBERTJapanese.m`](./FineTuneBERTJapanese.m) shows the same workflow using a pretrained Japanese-BERT model. This example requires the `factoryReportsJP.csv` data set from the Text Analytics example (Analyze Japanese Text Data)[https://www.mathworks.com/help/textanalytics/ug/analyze-japanese-text.html], available in R2023a or later.
`80`	`82`
`81`	`83`	`## Example: Analyze Sentiment with FinBERT`
`82`	`84`	`FinBERT is a sentiment analysis model trained on financial text data and fine-tuned for sentiment analysis.`

`‎predictMaskedToken.m‎`

Lines changed: 2 additions & 2 deletions

Original file line number	Diff line number	Diff line change
`@@ -6,7 +6,7 @@`
`6`	`6`	`% replaces instances of mdl.Tokenizer.MaskToken in the string text with`
`7`	`7`	`% the most likely token according to the BERT model mdl.`
`8`	`8`
`9`		`-% Copyright 2021 The MathWorks, Inc.`
	`9`	`+% Copyright 2021-2023 The MathWorks, Inc.`
`10`	`10`	`arguments`
`11`	`11`	`mdl {mustBeA(mdl,'struct')}`
`12`	`12`	`str {mustBeText}`
`@@ -44,7 +44,7 @@`
`44`	`44`	`tokens = fulltok.tokenize(pieces(i));`
`45`	`45`	`if ~isempty(tokens)`
`46`	`46`	`% "" tokenizes to empty - awkward`
`47`		`- x = cat(2,x,fulltok.encode(tokens));`
	`47`	`+ x = cat(2,x,fulltok.encode(tokens{1}));`
`48`	`48`	`end`
`49`	`49`	`if i<numel(pieces)`
`50`	`50`	`x = cat(2,x,maskCode);`

`‎test/tpredictMaskedToken.m‎`

Lines changed: 49 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,49 @@`
	`1`	`+classdef(SharedTestFixtures={`
	`2`	`+ DownloadBERTFixture, DownloadJPBERTFixture}) tpredictMaskedToken < matlab.unittest.TestCase`
	`3`	`+ % tpredictMaskedToken Unit test for predictMaskedToken`
	`4`	`+`
	`5`	`+ % Copyright 2023 The MathWorks, Inc.`
	`6`	`+`
	`7`	`+ properties(TestParameter)`
	`8`	`+ AllModels = {"base","multilingual-cased","medium",...`
	`9`	`+ "small","mini","tiny","japanese-base",...`
	`10`	`+ "japanese-base-wwm"}`
	`11`	`+ ValidText = iGetValidText;`
	`12`	`+ end`
	`13`	`+`
	`14`	`+ methods(Test)`
	`15`	`+ function verifyOutputDimSizes(test, AllModels, ValidText)`
	`16`	`+ inSize = size(ValidText);`
	`17`	`+ mdl = bert("Model", AllModels);`
	`18`	`+ outputText = predictMaskedToken(mdl,ValidText);`
	`19`	`+ test.verifyEqual(size(outputText), inSize);`
	`20`	`+ end`
	`21`	`+`
	`22`	`+ function maskTokenIsRemoved(test, AllModels)`
	`23`	`+ text = "This has a [MASK] token.";`
	`24`	`+ mdl = bert("Model", AllModels);`
	`25`	`+ outputText = predictMaskedToken(mdl,text);`
	`26`	`+ test.verifyFalse(contains(outputText, "[MASK]"));`
	`27`	`+ end`
	`28`	`+`
	`29`	`+ function inputWithoutMASKRemainsTheSame(test, AllModels)`
	`30`	`+ text = "This has a no mask token.";`
	`31`	`+ mdl = bert("Model", AllModels);`
	`32`	`+ outputText = predictMaskedToken(mdl,text);`
	`33`	`+ test.verifyEqual(text, outputText);`
	`34`	`+ end`
	`35`	`+ end`
	`36`	`+end`
	`37`	`+`
	`38`	`+function validText = iGetValidText`
	`39`	`+manyStrs = ["Accelerating the pace of [MASK] and science";`
	`40`	`+ "The cat [MASK] soundly.";`
	`41`	`+ "The [MASK] set beautifully."];`
	`42`	`+singleStr = "Artificial intelligence continues to shape the future of industries," + ...`
	`43`	`+ " as innovative applications emerge in fields such as healthcare, transportation," + ...`
	`44`	`+ " entertainment, and finance, driving productivity and enhancing human capabilities.";`
	`45`	`+validText = struct('StringsAsColumns',manyStrs,...`
	`46`	`+ 'StringsAsRows',manyStrs',...`
	`47`	`+ 'ManyStrings',repmat(singleStr,3),...`
	`48`	`+ 'SingleString',singleStr);`
	`49`	`+end`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 4ebc201

File tree

3 files changed

3 files changed

`‎README.md‎`

`‎predictMaskedToken.m‎`

`‎test/tpredictMaskedToken.m‎`

0 commit comments