Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 619b94c

Browse files
authored
Merge pull request PaddlePaddle#2026 from luyaojie/develop
[DuUIE] encoding fixing for windows
2 parents fcd3d1a + 0e4b2ca commit 619b94c

7 files changed

Lines changed: 87 additions & 62 deletions

File tree

‎examples/information_extraction/DuUIE/README.md‎

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,7 @@ python3 run_seq2struct.py \
153153
--learning_rate=5e-4 \
154154
--seed=42 \
155155
--overwrite_output_dir \
156-
--gradient_accumulation_steps 1
156+
--gradient_accumulation_steps 2
157157
```
158158

159159
训练完成后,将生成对应的文件夹 `output/duuie_multi_task_b32_lr5e-4`
@@ -165,7 +165,7 @@ python3 run_seq2struct.py \
165165

166166
``` bash
167167
python process_data.py split-test
168-
python inference.py --data data/duuie_test_a/* --model output/duuie_multi_task_b32_lr5e-4
168+
python inference.py --data data/duuie_test_a --model output/duuie_multi_task_b32_lr5e-4
169169
```
170170

171171
### 快速基线第四步:后处理提交结果

‎examples/information_extraction/DuUIE/config/multi-task-duuie.yaml‎

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -4,61 +4,61 @@ T1:
44
sel2record: longer_first_zh
55
eval_match_mode: set
66
metrics:
7-
- string-rel-strict-F1
7+
- string-rel-boundary-F1
88

99
T2:
1010
name: DUIE_ORG_SPO
1111
path: data/duuie_pre/DUIE_ORG_SPO
1212
sel2record: longer_first_zh
1313
eval_match_mode: set
1414
metrics:
15-
- string-rel-strict-F1
15+
- string-rel-boundary-F1
1616

1717
T3:
1818
name: 体育竞赛
1919
path: data/duuie_pre/体育竞赛
2020
sel2record: longer_first_zh
21-
eval_match_mode: normal
21+
eval_match_mode: set
2222
metrics:
2323
- string-evt-role-F1
2424

2525
T4:
2626
name: 灾害意外
2727
path: data/duuie_pre/灾害意外
2828
sel2record: longer_first_zh
29-
eval_match_mode: normal
29+
eval_match_mode: set
3030
metrics:
3131
- string-evt-role-F1
3232

3333
T5:
3434
name: CONV-ASA
3535
path: data/duuie_pre/CONV-ASA
3636
sel2record: longer_first_zh
37-
eval_match_mode: normal
37+
eval_match_mode: set
3838
metrics:
3939
- string-rel-strict-F1
4040

4141
T6:
4242
name: MSRA
4343
path: data/duuie_pre/MSRA
4444
sel2record: longer_first_zh
45-
eval_match_mode: normal
45+
eval_match_mode: set
4646
metrics:
4747
- string-ent-F1
4848

4949
T7:
5050
name: PEOPLE_DAILY
5151
path: data/duuie_pre/PEOPLE_DAILY
5252
sel2record: longer_first_zh
53-
eval_match_mode: normal
53+
eval_match_mode: set
5454
metrics:
5555
- string-ent-F1
5656

5757
T8:
5858
name: 金融信息_中标
5959
path: data/duuie_pre/金融信息_中标
6060
sel2record: longer_first_zh
61-
eval_match_mode: normal
61+
eval_match_mode: set
6262
metrics:
6363
- string-evt-role-F1
6464

@@ -67,7 +67,7 @@ T9:
6767
name: 金融信息_企业融资
6868
path: data/duuie_pre/金融信息_企业融资
6969
sel2record: longer_first_zh
70-
eval_match_mode: normal
70+
eval_match_mode: set
7171
metrics:
7272
- string-evt-role-F1
7373

@@ -76,7 +76,7 @@ T10:
7676
name: 金融信息_股份回购
7777
path: data/duuie_pre/金融信息_股份回购
7878
sel2record: longer_first_zh
79-
eval_match_mode: normal
79+
eval_match_mode: set
8080
metrics:
8181
- string-evt-role-F1
8282

@@ -85,7 +85,7 @@ T11:
8585
name: 金融信息_中标
8686
path: data/duuie_pre/金融信息_中标
8787
sel2record: longer_first_zh
88-
eval_match_mode: normal
88+
eval_match_mode: set
8989
metrics:
9090
- string-evt-role-F1
9191

@@ -94,7 +94,7 @@ T12:
9494
name: 金融信息_高管变动
9595
path: data/duuie_pre/金融信息_高管变动
9696
sel2record: longer_first_zh
97-
eval_match_mode: normal
97+
eval_match_mode: set
9898
metrics:
9999
- string-evt-role-F1
100100

@@ -103,70 +103,70 @@ T13:
103103
name: 金融信息_亏损
104104
path: data/duuie_pre/金融信息_亏损
105105
sel2record: longer_first_zh
106-
eval_match_mode: normal
106+
eval_match_mode: set
107107
metrics:
108108
- string-evt-role-F1
109109

110110
T14:
111111
name: 金融信息_公司上市
112112
path: data/duuie_pre/金融信息_公司上市
113113
sel2record: longer_first_zh
114-
eval_match_mode: normal
114+
eval_match_mode: set
115115
metrics:
116116
- string-evt-role-F1
117117

118118
T15:
119119
name: 金融信息_被约谈
120120
path: data/duuie_pre/金融信息_被约谈
121121
sel2record: longer_first_zh
122-
eval_match_mode: normal
122+
eval_match_mode: set
123123
metrics:
124124
- string-evt-role-F1
125125

126126
T16:
127127
name: 金融信息_企业收购
128128
path: data/duuie_pre/金融信息_企业收购
129129
sel2record: longer_first_zh
130-
eval_match_mode: normal
130+
eval_match_mode: set
131131
metrics:
132132
- string-evt-role-F1
133133

134134
T17:
135135
name: 金融信息_股东减持
136136
path: data/duuie_pre/金融信息_股东减持
137137
sel2record: longer_first_zh
138-
eval_match_mode: normal
138+
eval_match_mode: set
139139
metrics:
140140
- string-evt-role-F1
141141

142142
T18:
143143
name: 金融信息_解除质押
144144
path: data/duuie_pre/金融信息_解除质押
145145
sel2record: longer_first_zh
146-
eval_match_mode: normal
146+
eval_match_mode: set
147147
metrics:
148148
- string-evt-role-F1
149149

150150
T19:
151151
name: 金融信息_企业破产
152152
path: data/duuie_pre/金融信息_企业破产
153153
sel2record: longer_first_zh
154-
eval_match_mode: normal
154+
eval_match_mode: set
155155
metrics:
156156
- string-evt-role-F1
157157

158158
T20:
159159
name: 金融信息_股东增持
160160
path: data/duuie_pre/金融信息_股东增持
161161
sel2record: longer_first_zh
162-
eval_match_mode: normal
162+
eval_match_mode: set
163163
metrics:
164164
- string-evt-role-F1
165165

166166
T21:
167167
name: 金融信息_质押
168168
path: data/duuie_pre/金融信息_质押
169169
sel2record: longer_first_zh
170-
eval_match_mode: normal
170+
eval_match_mode: set
171171
metrics:
172172
- string-evt-role-F1

‎examples/information_extraction/DuUIE/inference.py‎

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616

1717

1818
def read_json_file(file_name):
19-
return [json.loads(line) for line in open(file_name)]
19+
return [json.loads(line) for line in open(file_name, encoding='utf8')]
2020

2121

2222
def schema_to_ssi(schema: RecordSchema):
@@ -75,15 +75,19 @@ def to_tensor(x):
7575
return [post_processing(x) for x in pred]
7676

7777

78+
def find_to_predict_folder(folder_name):
79+
for root, dirs, _ in os.walk(folder_name):
80+
for dirname in dirs:
81+
data_name = os.path.join(root, dirname)
82+
if os.path.exists(os.path.join(data_name, 'record.schema')):
83+
yield data_name
84+
85+
7886
def main():
7987
import argparse
8088
parser = argparse.ArgumentParser()
8189
parser.add_argument(
82-
'--data',
83-
'-d',
84-
required=True,
85-
nargs='+',
86-
help='Folder need to been predicted.')
90+
'--data', '-d', required=True, help='Folder need to been predicted.')
8791
parser.add_argument(
8892
'--model', '-m', required=True, help='Trained model for inference')
8993
parser.add_argument(
@@ -102,7 +106,8 @@ def main():
102106
parser.add_argument('--verbose', action='store_true')
103107
options = parser.parse_args()
104108

105-
data_folder = options.data
109+
# Find the folder need to be predicted with `record.schema`
110+
data_folder = find_to_predict_folder(options.data)
106111
model_path = options.model
107112

108113
predictor = Predictor(
@@ -142,7 +147,7 @@ def main():
142147
records += [sel2record.sel2record(pred=p, text=text, tokens=tokens)]
143148

144149
pred_filename = os.path.join(f"{task_folder}", "pred.json")
145-
with open(pred_filename, 'w') as output:
150+
with open(pred_filename, 'w', encoding='utf8') as output:
146151
for record in records:
147152
output.write(json.dumps(record, ensure_ascii=False) + '\n')
148153

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /