Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

MLRS/BERTu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

14 Commits

Repository files navigation

BERTu: A BERT-based language model for the Maltese language πŸ‡²πŸ‡Ή

This repository contains code & information relevant for the paper Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and BERT Models for Maltese.

The pre-trained language models can be accessed through the Hugging Face Hub using MLRS/BERTu or MLRS/mBERTu. For details on how pre-training was done see the pretrain directory.

The models were trained on Korpus Malti v4.0, which can be accessed through the Hugging Face Hub using MLRS/korpus_malti.

Citation

Cite this work as follows:

@inproceedings{BERTu,
 title = "Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and {BERT} Models for {M}altese",
 author = "Micallef, Kurt and
 Gatt, Albert and
 Tanti, Marc and
 van der Plas, Lonneke and
 Borg, Claudia",
 booktitle = "Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing",
 month = jul,
 year = "2022",
 address = "Hybrid",
 publisher = "Association for Computational Linguistics",
 url = "https://aclanthology.org/2022.deeplo-1.10",
 doi = "10.18653/v1/2022.deeplo-1.10",
 pages = "90--101",
}

About

A BERT-based language model for the Maltese language

Resources

License

Stars

Watchers

Forks

Packages

Contributors

Languages

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /