Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 5b13873

Browse files
committed
Update results with reference to https://github.com/oshizo/JapaneseEmbeddingEval
1 parent 32cd1c8 commit 5b13873

File tree

2 files changed

+72
-42
lines changed

2 files changed

+72
-42
lines changed

‎README.md‎

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -336,7 +336,8 @@ fine-tuning元のモデルとしては、`cl-tohoku/bert-large-japanese-v2`と`c
336336

337337

338338
また、公開したモデルと、既存の日本語対応文埋め込みモデルについて、評価結果を比較したものが以下の表になります。
339-
評価には`src/evaluate.py`を用いています。
339+
補助的な比較対象として、事前学習済み言語モデルをfine-tuningせずに、そのまま文埋め込みモデルとして用いた場合の結果も示しています。
340+
評価には`src/evaluate.py`を用いました。
340341

341342
| Model | JSICK (val) | JSICK (test) | JSTS (train) | JSTS (val) | Avg. |
342343
| -------------------------------------------------------------------------------------------------------------------------------- | :---------: | :----------: | :----------: | :--------: | :-------: |
@@ -349,6 +350,7 @@ fine-tuning元のモデルとしては、`cl-tohoku/bert-large-japanese-v2`と`c
349350
| [pkshatech/simcse-ja-bert-base-clcmlp](https://huggingface.co/pkshatech/simcse-ja-bert-base-clcmlp) | 74.47 | 73.46 | 78.05 | 80.14 | 77.21 |
350351
| [colorfulscoop/sbert-base-ja](https://huggingface.co/colorfulscoop/sbert-base-ja) | 67.19 | 65.73 | 74.16 | 74.24 | 71.38 |
351352
| [sonoisa/sentence-luke-japanese-base-lite](https://huggingface.co/sonoisa/sentence-luke-japanese-base-lite) | 78.76 | 77.26 | 80.55 | 82.54 | 80.11 |
353+
| [oshizo/sbert-jsnli-luke-japanese-base-lite](https://huggingface.co/oshizo/sbert-jsnli-luke-japanese-base-lite) | 72.96 | 72.60 | 77.88 | 81.09 | 77.19 |
352354
| | | | | | |
353355
| [MU-Kindai/Japanese-SimCSE-BERT-large-sup](https://huggingface.co/MU-Kindai/Japanese-SimCSE-BERT-large-sup) | 77.06 | 77.48 | 70.83 | 75.83 | 74.71 |
354356
| [MU-Kindai/Japanese-SimCSE-BERT-base-sup](https://huggingface.co/MU-Kindai/Japanese-SimCSE-BERT-base-sup) | 74.10 | 74.19 | 70.08 | 73.26 | 72.51 |
@@ -357,6 +359,10 @@ fine-tuning元のモデルとしては、`cl-tohoku/bert-large-japanese-v2`と`c
357359
| [MU-Kindai/Japanese-MixCSE-BERT-base](https://huggingface.co/MU-Kindai/Japanese-MixCSE-BERT-base) | 76.72 | 76.94 | 72.40 | 76.23 | 75.19 |
358360
| [MU-Kindai/Japanese-DiffCSE-BERT-base](https://huggingface.co/MU-Kindai/Japanese-DiffCSE-BERT-base) | 75.61 | 75.83 | 71.62 | 75.81 | 74.42 |
359361
| | | | | | |
362+
| [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small) | 82.01 | 81.38 | 74.48 | 78.92 | 78.26 |
363+
| [intfloat/multilingual-e5-base](https://huggingface.co/intfloat/multilingual-e5-base) | 81.25 | 80.56 | 76.04 | 79.65 | 78.75 |
364+
| [intfloat/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large) | 80.57 | 79.39 | 79.16 | 81.85 | 80.13 |
365+
| | | | | | |
360366
| [sentence-transformers/LaBSE](https://huggingface.co/sentence-transformers/LaBSE) | 76.54 | 76.77 | 72.15 | 76.12 | 75.02 |
361367
| [sentence-transformers/stsb-xlm-r-multilingual](https://huggingface.co/sentence-transformers/stsb-xlm-r-multilingual) | 73.09 | 72.00 | 77.83 | 78.43 | 76.09 |
362368
| | | | | | |
@@ -373,7 +379,9 @@ fine-tuning元のモデルとしては、`cl-tohoku/bert-large-japanese-v2`と`c
373379
| [text-embedding-ada-002](https://platform.openai.com/docs/api-reference/embeddings) | 79.31 | 78.95 | 74.52 | 79.01 | 77.49 |
374380

375381
表から、全体として今回公開したモデルが最もよい性能を示していることがわかります。
376-
また、OpenAIのtext-embedding-ada-002よりもより性能になっている点は注目に値します。
382+
また、OpenAIのtext-embedding-ada-002よりも高い性能になっている点は注目に値するでしょう。
383+
384+
さらに、[cl-nagoya/unsup-simcse-ja-large](https://huggingface.co/cl-nagoya/unsup-simcse-ja-large)などUnsupervised SimCSEによってfine-tuningされた文埋め込みモデルの性能が、教師あり学習された他のモデルの性能と遜色ない性能を発揮していることも特筆すべき点と言えるでしょう。
377385

378386
注意として、PKSHA社の文埋め込みモデルはJSTSの開発セットを訓練中の開発セットとして利用しているので、本実験の結果とは直接比較できません。
379387
また、この評価結果はSTSタスクに限定されたものであり、情報検索タスクなど異なるタスクでの汎用性を保証するものではありません。

‎src/evaluate.py‎

Lines changed: 62 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,29 @@
1-
import os
2-
from concurrent.futures import ThreadPoolExecutor
3-
4-
import numpy as np
5-
import openai
61
import torch.nn as nn
7-
from more_itertools import chunked
8-
from openai.openai_object import OpenAIObject
92
from sentence_transformers import SentenceTransformer, models
103
from src.sts import STSEvaluation
114
from transformers import AutoModel, BertModel
125

13-
openai.api_key = os.environ["OPENAI_API_KEY"]
14-
156
# MODEL_PATH = "cl-nagoya/sup-simcse-ja-large"
167
# MODEL_PATH = "cl-nagoya/sup-simcse-ja-base"
178
# MODEL_PATH = "MU-Kindai/Japanese-SimCSE-BERT-large-sup"
189
# MODEL_PATH = "colorfulscoop/sbert-base-ja"
19-
MODEL_PATH = "pkshatech/GLuCoSE-base-ja"
10+
# MODEL_PATH = "pkshatech/GLuCoSE-base-ja"
11+
# MODEL_PATH = "oshizo/sbert-jsnli-luke-japanese-base-lite"
12+
MODEL_PATH = "intfloat/multilingual-e5-large"
13+
14+
15+
sts = STSEvaluation(sts_dir="./datasets/sts")
16+
2017

18+
def evaluate():
19+
model = SentenceTransformer(MODEL_PATH).eval().cuda()
20+
print(sts.dev(encode=model.encode))
21+
print(sts(encode=model.encode))
2122

22-
def load_jcse(model_name: str):
23-
backbone = models.Transformer(model_name)
24-
pretrained_model: BertModel = AutoModel.from_pretrained(model_name)
23+
24+
def evaluate_jcse():
25+
backbone = models.Transformer(MODEL_PATH)
26+
pretrained_model: BertModel = AutoModel.from_pretrained(MODEL_PATH)
2527
hidden_size = pretrained_model.config.hidden_size
2628

2729
# load weights of Transformer layers
@@ -31,7 +33,7 @@ def load_jcse(model_name: str):
3133
pooling_mode="cls",
3234
)
3335

34-
if "unsup" in model_name:
36+
if "unsup" in MODEL_PATH:
3537
model = SentenceTransformer(modules=[backbone, pooling]).eval().cuda()
3638

3739
else:
@@ -49,44 +51,64 @@ def load_jcse(model_name: str):
4951
mlp.load_state_dict(mlp_state_dict)
5052
model = SentenceTransformer(modules=[backbone, pooling, mlp]).eval().cuda()
5153

52-
return model
54+
print(sts.dev(encode=model.encode))
55+
print(sts(encode=model.encode))
5356

5457

55-
def load_vanilla(model_name: str):
56-
backbone = models.Transformer(model_name)
58+
def evaluate_vanilla():
59+
backbone = models.Transformer(MODEL_PATH)
5760
pooling = models.Pooling(
5861
word_embedding_dimension=backbone.auto_model.config.hidden_size,
5962
pooling_mode="cls",
6063
)
61-
return SentenceTransformer(modules=[backbone, pooling]).eval().cuda()
64+
model = SentenceTransformer(modules=[backbone, pooling]).eval().cuda()
65+
print(sts.dev(encode=model.encode))
66+
print(sts(encode=model.encode))
6267

6368

64-
sts = STSEvaluation(sts_dir="./datasets/sts")
69+
def evaluate_openai():
70+
import os
71+
import openai
72+
import numpy as np
73+
from concurrent.futures import ThreadPoolExecutor
74+
from more_itertools import chunked
75+
from openai.openai_object import OpenAIObject
76+
77+
openai.api_key = os.environ["OPENAI_API_KEY"]
78+
79+
def encode_openai(batch: list[str]):
80+
res: OpenAIObject = openai.Embedding.create(
81+
model="text-embedding-ada-002",
82+
input=batch,
83+
)
84+
return [d.embedding for d in res.data]
85+
86+
def encode(sentences: list[str], batch_size: int = 128):
87+
embs = []
88+
with ThreadPoolExecutor(max_workers=32) as executor:
89+
batches = chunked(list(sentences), batch_size)
90+
for emb in executor.map(encode_openai, batches):
91+
embs += emb
92+
embs = np.array(embs)
93+
return embs
6594

66-
# model = load_jcse(MODEL_PATH)
67-
# model = load_vanilla("cl-tohoku/bert-base-japanese-v3")
68-
model = SentenceTransformer(MODEL_PATH).eval().cuda()
69-
print(sts.dev(encode=model.encode))
70-
print(sts(encode=model.encode))
95+
print(sts.dev(encode=encode))
96+
print(sts(encode=encode))
7197

7298

73-
# def encode_openai(batch: list[str]):
74-
# res: OpenAIObject = openai.Embedding.create(
75-
# model="text-embedding-ada-002",
76-
# input=batch,
77-
# )
78-
# return [d.embedding for d in res.data]
99+
def evaluate_e5():
100+
model = SentenceTransformer(MODEL_PATH).eval().cuda()
79101

102+
def encode(sentences: list[str]):
103+
sentences = [f"query: {s}" for s in sentences]
104+
return model.encode(sentences)
80105

81-
# def encode(sentences: list[str], batch_size: int = 128):
82-
# embs = []
83-
# with ThreadPoolExecutor(max_workers=32) as executor:
84-
# batches = chunked(list(sentences), batch_size)
85-
# for emb in executor.map(encode_openai, batches):
86-
# embs += emb
87-
# embs = np.array(embs)
88-
# return embs
106+
print(sts.dev(encode=encode))
107+
print(sts(encode=encode))
89108

90109

91-
# print(sts.dev(encode=encode))
92-
# print(sts(encode=encode))
110+
if __name__ == "__main__":
111+
# evaluate()
112+
# evaluate_vanilla()
113+
# evaluate_openai()
114+
evaluate_e5()

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /