Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Neural Grammatical Error Correction for Romanian using Transformer

License

Notifications You must be signed in to change notification settings

teodor-cotet/RoGEC

Repository files navigation

Grammatical Error Correction for Romanin

This repository contains the code and data for: romanian grammatical error correction (GEC) on RONACC.

Download Data

Download the RONACC corpus: RONACC

Tokenized RONACC corpus: RONACC extra

Download pre-trained models

Download the language model: 30mil_wiki_lm
Download the synthetic corpus 10m_synthetic
Download trained Transformer-based fine-tune model: transformer-base-fine-tune

Run Experiment

Install python dependencies:
pip3 install -r requirements.txt
If you want to use LM predictions install kenlm libraries: kenlm
To run decoding on an existing model run:
python3 transformer.py --checkpoint=path_to_model_checkpoint --lm_path=path_to_lm --d_model=size_of_model --decode_mode=True
(the size of the fine tuned model is 768)
To train models run:
python3 transformer.py --checkpoint=path_to_model_checkpoint --separate=False --d_model=size_of_model --use_txt=True --dataset_file=path_to_txt_file_wrong_gold --train_mode=True

If you want to run on tpu, you can use the --use_tpu=True argument, but you need to generated tf records file.

ERRANT

Install ERRANT

You can use errant normall, just pass the argument -lang ro if you want to use it for Romanian. More details in the ERRANT readme.

Citing

@inproceedings{cotet2020neural,
 title={Neural grammatical error correction for romanian},
 author={Cotet, Teodor-Mihai and Ruseti, Stefan and Dascalu, Mihai},
 booktitle={2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI)},
 pages={625--631},
 year={2020},
 organization={IEEE}
}

About

Neural Grammatical Error Correction for Romanian using Transformer

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

AltStyle によって変換されたページ (->オリジナル) /