CAMeL-Lab/text-editing

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
edits		edits
gec		gec
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
setup.py		setup.py

Repository files navigation

Enhancing Text Editing for Grammatical Error Correction

This repo contains code and pretrained models to reproduce the results in our paper Enhancing Text Editing for Grammatical Error Correction: Arabic as a Case Study.

Requirements:

The code was written for python>=3.10, pytorch 1.12.1, and transformers 4.30.0. You will need a few additional packages. Here's how you can set up the environment using conda (assuming you have conda and cuda installed):

git clone https://github.com/CAMeL-Lab/text-editing.git
cd text-editing
conda create -n text-editing python=3.10
conda activate text-editing
pip install -e .

Experiments and Reproducibility:

All the datasets we used throughout the paper to train and test various systems can be downloded from here.

This repo is organized as follows:

edits: includes the scripts needed to extract edits from parallel GEC corpora and to create different edit representation.
gec: includes the scripts needed to train and evaluate our text editing GEC systems.

Hugging Face Integration:

We make our text editing models publicly available on Hugging Face.

from transformers import BertTokenizer, BertForTokenClassification
import torch
import torch.nn.functional as F
from gec.tag import rewrite
nopnx_tokenizer = BertTokenizer.from_pretrained('CAMeL-Lab/text-editing-qalb14-nopnx')
nopnx_model = BertForTokenClassification.from_pretrained('CAMeL-Lab/text-editing-qalb14-nopnx')
pnx_tokenizer = BertTokenizer.from_pretrained('CAMeL-Lab/text-editing-qalb14-pnx')
pnx_model = BertForTokenClassification.from_pretrained('CAMeL-Lab/text-editing-qalb14-pnx')
def predict(model, tokenizer, text, decode_iter=1):
 for _ in range(decode_iter):
 tokenized_text = tokenizer(text, return_tensors="pt", is_split_into_words=True)
 with torch.no_grad():
 logits = model(**tokenized_text).logits
 preds = F.softmax(logits.squeeze(), dim=-1)
 preds = torch.argmax(preds, dim=-1).cpu().numpy()
 edits = [model.config.id2label[p] for p in preds[1:-1]]
 assert len(edits) == len(tokenized_text['input_ids'][0][1:-1])
 subwords = tokenizer.convert_ids_to_tokens(tokenized_text['input_ids'][0][1:-1])
 text = rewrite(subwords=[subwords], edits=[edits])[0][0]
 return text
text = 'يجب الإهتمام ب الصحه و لا سيما ف ي الصحه النفسيه ياشباب المستقبل،،'.split()
output_sent = predict(nopnx_model, nopnx_tokenizer, text, decode_iter=2)
output_sent = predict(pnx_model, pnx_tokenizer, output_sent.split(), decode_iter=1)
print(output_sent) # يجب الاهتمام بالصحة ولا سيما في الصحة النفسية يا شباب المستقبل .

License:

This repo is available under the MIT license. See the LICENSE for more info.

Citation:

If you find the code or data in this repo helpful, please cite our paper:

@misc{alhafni-habash-2025-enhancing,
 title={Enhancing Text Editing for Grammatical Error Correction: Arabic as a Case Study}, 
 author={Bashar Alhafni and Nizar Habash},
 year={2025},
 eprint={2503.00985},
 archivePrefix={arXiv},
 primaryClass={cs.CL},
 url={https://arxiv.org/abs/2503.00985}, 
}

About

Code, models, and data for "Enhancing Text Editing for Grammatical Error Correction: Arabic as a Case Study", ACL 2025

Releases

No releases published

Packages

No packages published

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

CAMeL-Lab/text-editing

Folders and files

Latest commit

History

Repository files navigation

Enhancing Text Editing for Grammatical Error Correction

Requirements:

Experiments and Reproducibility:

Hugging Face Integration:

License:

Citation:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

License

CAMeL-Lab/text-editing

Folders and files

Latest commit

History

Repository files navigation

Enhancing Text Editing for Grammatical Error Correction

Requirements:

Experiments and Reproducibility:

Hugging Face Integration:

License:

Citation:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages