1
0
Fork
You've already forked language-detection
0
No description
Jupyter Notebook 94.1%
Python 5.9%
Shawon Ashraf a9c4219b0e typo fix
2025年06月24日 11:06:58 +06:00
notebook moved to notebook dir 2025年06月24日 10:41:08 +06:00
src a small log message 2025年06月24日 11:04:06 +06:00
.gitignore init 2025年06月24日 02:55:24 +06:00
.python-version init 2025年06月24日 02:55:24 +06:00
pyproject.toml added logging and ruff checks 2025年06月24日 11:03:24 +06:00
README.md typo fix 2025年06月24日 11:06:58 +06:00
uv.lock added logging and ruff checks 2025年06月24日 11:03:24 +06:00

Language Detction from documents using n-gram profiles

This notebook is an attempt at building an n-gram profile based language detector inspired by N-gram-based text categorization Cavnar, Trenkle (1994).

BibTex entry

@inproceedings{Cavnar1994NgrambasedTC,
 title={N-gram-based text categorization},
 author={William B. Cavnar and John M. Trenkle},
 year={1994},
 url={https://api.semanticscholar.org/CorpusID:170740}
}

Env Setup

Make sure to have uv installed before you proceed.

uv sync
source .venv/bin/activate

To run the example notebook,

jupyter notebook

Otherwise you can run the cli script,

uv run src/main.py PROFILE_SIZE
# example
uv run src/main.py 200