Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Language Detector Loads and cleans text data, trains a language classification model using TF-IDF and Logistic Regression, evaluates it, and enables interactive language prediction with saved model reuse.

License

Notifications You must be signed in to change notification settings

AUX-441/Language-Detector-Model

Repository files navigation

🌐 Language Detector Model

This project is a Language Detection Model built in Python.
It loads and cleans a multilingual text dataset, trains a machine learning model, and allows users to test language predictions on custom input.

The model uses TF-IDF Vectorization and Logistic Regression to classify text into its respective language.
Additionally, the project includes data cleaning, model evaluation, and visualization of results.


✨ Features

πŸš€ Fast Language Detection
Detects the language of any given text within milliseconds using a trained ML model.

🧹 Automated Data Cleaning
Removes duplicate or over-represented text entries to improve model accuracy.

πŸ“Š Model Evaluation & Reporting
Generates accuracy score, classification report, and detailed confusion matrix.

🎨 Beautiful Visualizations
Creates heatmaps, histograms, and line plots to visualize model performance.

πŸ’Ύ Model Persistence
Saves the trained model as a .pkl file for easy re-use without retraining.

πŸ–₯ Interactive Testing
Accepts user input directly from the terminal to test predictions instantly.

πŸ›  Customizable Training Pipeline
Easily adjust vectorization method, features, and classifier for experimentation.

πŸ“ Multi-Language Dataset Support
Handles large multilingual datasets with millions of entries efficiently.


βš™οΈ Features

  • Dataset Analysis & Cleaning

    • Loads CSV dataset
    • Removes repeated text entries (>3 times)
    • Filters languages with very low frequency
  • Model Training

    • TF-IDF character-level features
    • Logistic Regression classifier
    • Accuracy score and classification report
    • Saves trained model as .pkl
  • Visualization

    • Confusion matrix heatmap
    • True positives line plot
    • True positives histogram
  • Model Testing

    • Interactive user input
    • Predicts language instantly

🧠 Model Details

Vectorizer: TfidfVectorizer(analyzer='char', max_features=2500)

Classifier: LogisticRegression(max_iter=2500, solver='saga')

Split Ratio: 70% train / 30% test

Evaluation Metrics: Accuracy, classification report, confusion matrix


πŸ“¦ Installation

Clone the repository

git clone https://github.com/AUX-441/Language-Detector-Model.git
cd Language-Detector-Model
python Test_Model.py
Enter Your Language here (type 'exist' to quit): Bonjour
Predicted Language : French
---

About

Language Detector Loads and cleans text data, trains a language classification model using TF-IDF and Logistic Regression, evaluates it, and enables interactive language prediction with saved model reuse.

Topics

Resources

License

Stars

Watchers

Forks

Languages

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /