AUX-441/Language-Detector-Model

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.idea		.idea
Confusion_matrix		Confusion_matrix
Model		Model
Cleaner_DS.py		Cleaner_DS.py
LICENSE		LICENSE
README.md		README.md
Test.py		Test.py
Train.py		Train.py
requirements.txt		requirements.txt

Repository files navigation

🌐 Language Detector Model

This project is a Language Detection Model built in Python.
It loads and cleans a multilingual text dataset, trains a machine learning model, and allows users to test language predictions on custom input.

The model uses TF-IDF Vectorization and Logistic Regression to classify text into its respective language.
Additionally, the project includes data cleaning, model evaluation, and visualization of results.

✨ Features

🚀 Fast Language Detection
Detects the language of any given text within milliseconds using a trained ML model.

🧹 Automated Data Cleaning
Removes duplicate or over-represented text entries to improve model accuracy.

📊 Model Evaluation & Reporting
Generates accuracy score, classification report, and detailed confusion matrix.

🎨 Beautiful Visualizations
Creates heatmaps, histograms, and line plots to visualize model performance.

💾 Model Persistence
Saves the trained model as a .pkl file for easy re-use without retraining.

🖥 Interactive Testing
Accepts user input directly from the terminal to test predictions instantly.

🛠 Customizable Training Pipeline
Easily adjust vectorization method, features, and classifier for experimentation.

📁 Multi-Language Dataset Support
Handles large multilingual datasets with millions of entries efficiently.

⚙️ Features

Dataset Analysis & Cleaning
- Loads CSV dataset
- Removes repeated text entries (>3 times)
- Filters languages with very low frequency
Model Training
- TF-IDF character-level features
- Logistic Regression classifier
- Accuracy score and classification report
- Saves trained model as .pkl
Visualization
- Confusion matrix heatmap
- True positives line plot
- True positives histogram
Model Testing
- Interactive user input
- Predicts language instantly

🧠 Model Details

Vectorizer: TfidfVectorizer(analyzer='char', max_features=2500)

Classifier: LogisticRegression(max_iter=2500, solver='saga')

Split Ratio: 70% train / 30% test

Evaluation Metrics: Accuracy, classification report, confusion matrix

📦 Installation

Clone the repository

git clone https://github.com/AUX-441/Language-Detector-Model.git
cd Language-Detector-Model
python Test_Model.py
Enter Your Language here (type 'exist' to quit): Bonjour
Predicted Language : French
---

About

Language Detector Loads and cleans text data, trains a language classification model using TF-IDF and Logistic Regression, evaluates it, and enables interactive language prediction with saved model reuse.

Releases

1 tags

Languages

Python 100.0%

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

AUX-441/Language-Detector-Model

Folders and files

Latest commit

History

Repository files navigation

🌐 Language Detector Model

✨ Features

⚙️ Features

🧠 Model Details

📦 Installation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Languages