CAMeL Tools is suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.
Please use GitHub Issues to report a bug or if you need help using CAMeL Tools.
You will need Python 3.8 - 3.12 (64-bit) as well as the Rust compiler installed.
You will need to install some additional dependencies on Linux and macOS. Primarily CMake, and Boost.
On Ubuntu/Debian you can install these dependencies by running:
sudo apt-get install cmake libboost-all-dev
On macOS you can install them using Homewbrew by running:
brew install cmake boost
pip install camel-tools
# or run the following if you already have camel_tools installed
pip install camel-tools --upgrade
On Apple silicon Macs you may have to run the following instead:
CMAKE_OSX_ARCHITECTURES=arm64 pip install camel-tools
# or run the following if you already have camel_tools installed
CMAKE_OSX_ARCHITECTURES=arm64 pip install camel-tools --upgrade
# Clone the repo git clone https://github.com/CAMeL-Lab/camel_tools.git cd camel_tools # Install from source pip install . # or run the following if you already have camel_tools installed pip install --upgrade .
To install the datasets required by CAMeL Tools components run one of the following:
# To install all datasets camel_data -i all # or just the datasets for morphology and MLE disambiguation only camel_data -i light # or just the default datasets for each component camel_data -i defaults
See Available Packages for a list of all available datasets.
By default, data is stored in ~/.camel_tools
.
Alternatively, if you would like to install the data in a different location,
you need to set the CAMELTOOLS_DATA
environment variable to the desired
path.
Add the following to your .bashrc
, .zshrc
, .profile
,
etc:
export CAMELTOOLS_DATA=/path/to/camel_tools_data
Note: CAMeL Tools has been tested on Windows 10. The Dialect Identification component is not available on Windows at this time.
pip install camel-tools -f https://download.pytorch.org/whl/torch_stable.html
# or run the following if you already have camel_tools installed
pip install --upgrade -f https://download.pytorch.org/whl/torch_stable.html camel-tools
# Clone the repo git clone https://github.com/CAMeL-Lab/camel_tools.git cd camel_tools # Install from source pip install -f https://download.pytorch.org/whl/torch_stable.html . pip install --upgrade -f https://download.pytorch.org/whl/torch_stable.html .
To install the data packages required by CAMeL Tools components, run one of the following commands:
# To install all datasets camel_data -i all # or just the datasets for morphology and MLE disambiguation only camel_data -i light # or just the default datasets for each component camel_data -i defaults
See Available Packages for a list of all available datasets.
By default, data is stored in
C:\Users\your_user_name\AppData\Roaming\camel_tools
.
Alternatively, if you would like to install the data in a different location,
you need to set the CAMELTOOLS_DATA
environment variable to the desired
path. Below are the instructions to do so (on Windows 10):
- Press the Windows button and type
env
. - Click on Edit the system environment variables (Control panel).
- Click on the Environment Variables... button.
- Click on the New... button under the User variables panel.
- Type
CAMELTOOLS_DATA
in the Variable name input box and the desired data path in Variable value. Alternatively, you can browse for the data directory by clicking on the Browse Directory... button. - Click OK on all the opened windows.
To get started, you can follow along the Guided Tour for a quick overview of the components provided by CAMeL Tools.
You can find the full online documentation here for both the command-line tools and the Python API.
Alternatively, you can build your own local copy of the documentation as follows:
# Install dependencies pip install sphinx myst-parser sphinx-rtd-theme # Go to docs subdirectory cd docs # Build HTML docs make html
This should compile all the HTML documentation in to docs/build/html
.
If you find CAMeL Tools useful in your research, please cite our paper:
@inproceedings{obeid-etal-2020-camel, title = "{CAM}e{L} Tools: An Open Source Python Toolkit for {A}rabic Natural Language Processing", author = "Obeid, Ossama and Zalmout, Nasser and Khalifa, Salam and Taji, Dima and Oudah, Mai and Alhafni, Bashar and Inoue, Go and Eryani, Fadhl and Erdmann, Alexander and Habash, Nizar", booktitle = "Proceedings of the 12th Language Resources and Evaluation Conference", month = may, year = "2020", address = "Marseille, France", publisher = "European Language Resources Association", url = "https://www.aclweb.org/anthology/2020.lrec-1.868", pages = "7022--7032", abstract = "We present CAMeL Tools, a collection of open-source tools for Arabic natural language processing in Python. CAMeL Tools currently provides utilities for pre-processing, morphological modeling, Dialect Identification, Named Entity Recognition and Sentiment Analysis. In this paper, we describe the design of CAMeL Tools and the functionalities it provides.", language = "English", ISBN = "979-10-95546-34-4", }
CAMeL Tools is available under the MIT license. See the LICENSE file for more info.
If you would like to contribute to CAMeL Tools, please read the CONTRIBUTE.rst file.