NLP paper implementation relevant to classification with PyTorch
The papers were implemented in using korean corpus
pyenv virualenv 3.7.7 nlp
pyenv activate nlp
pip install -r requirements.txt
python build_dataset.py
python build_vocab.py
python train.py # default training parameter
python evaluate.py # defatul evaluation parameter
Single sentence classification (sentiment classification task)
Using the Naver sentiment movie corpus v1.0 (a.k.a. nsmc)
Configuration
conf/model/{type}.json (e.g. type = ["sencnn", "charcnn",...])
conf/dataset/nsmc.json
Structure
# example: Convolutional_Neural_Networks_for_Sentence_Classification
├── build_dataset.py
├── build_vocab.py
├── conf
│ ├── dataset
│ │ └── nsmc.json
│ └── model
│ └── sencnn.json
├── evaluate.py
├── experiments
│ └── sencnn
│ └── epochs_5_batch_size_256_learning_rate_0.001
├── model
│ ├── data.py
│ ├── __init__.py
│ ├── metric.py
│ ├── net.py
│ ├── ops.py
│ ├── split.py
│ └── utils.py
├── nsmc
│ ├── ratings_test.txt
│ ├── ratings_train.txt
│ ├── test.txt
│ ├── train.txt
│ ├── validation.txt
│ └── vocab.pkl
├── train.py
└── utils.py
Model \ Accuracy
Train (120,000)
Validation (30,000)
Test (50,000)
Date
SenCNN
91.95%
86.54%
85.84%
20/05/30
CharCNN
86.29%
81.69%
81.38%
20/05/30
ConvRec
86.23%
82.93%
82.43%
20/05/30
VDCNN
86.59%
84.29%
84.10%
20/05/30
SAN
90.71%
86.70%
86.37%
20/05/30
ETRIBERT
91.12%
89.24%
88.98%
20/05/30
SKTBERT
92.20%
89.08%
88.96%
20/05/30
Pairwise-text-classification (paraphrase detection task)
# example: Siamese_recurrent_architectures_for_learning_sentence_similarity
├── build_dataset.py
├── build_vocab.py
├── conf
│ ├── dataset
│ │ └── qpair.json
│ └── model
│ └── siam.json
├── evaluate.py
├── experiments
│ └── siam
│ └── epochs_5_batch_size_64_learning_rate_0.001
├── model
│ ├── data.py
│ ├── __init__.py
│ ├── metric.py
│ ├── net.py
│ ├── ops.py
│ ├── split.py
│ └── utils.py
├── qpair
│ ├── kor_pair_test.csv
│ ├── kor_pair_train.csv
│ ├── test.txt
│ ├── train.txt
│ ├── validation.txt
│ └── vocab.pkl
├── train.py
└── utils.py
Model \ Accuracy
Train (6,136)
Validation (682)
Test (758)
Date
Siam
93.00%
83.13%
83.64%
20/05/30
SAN
89.47%
82.11%
81.53%
20/05/30
Stochastic
89.26%
82.69%
80.07%
20/05/30
ETRIBERT
95.07%
94.42%
94.06%
20/05/30
SKTBERT
95.43%
92.52%
93.93%
20/05/30