Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

AidenTran900/ml-library-cpp

Repository files navigation

ML Models

A C++ machine learning library built from the ground up implementing various ML algorithms and models.

Features

Models

  • Linear Regression with gradient descent optimization
  • Logistic Regression for binary classification
  • K-Nearest Neighbors (KNN) with Euclidean and Manhattan distance metrics
  • Support Vector Machines (SVM) with multiple kernels (Linear, Polynomial, RBF, Sigmoid)
  • Decision Trees with Gini and Entropy impurity measures
  • Random Forests with bootstrap aggregation
  • K-Means Clustering for unsupervised learning
  • Neural Networks with backpropagation and configurable layers
  • Residual Networks (ResNet) with skip connections
  • Transformer with multi-head self-attention, KV cache, and autoregressive generation
  • Perceptron for binary classification

Core Components

  • Matrix Operations: Addition, multiplication, transpose, inverse, Hadamard product, determinant — templated for float and double (Matrix<float> / Matrix<double>)
  • Activation Functions: ReLU, Sigmoid, Tanh, Linear, Softplus, Softmax, Step, Sign
  • Loss Functions: MSE, MAE, RMSE, Binary Cross-Entropy, Categorical Cross-Entropy
  • Optimizers: SGD, Mini-Batch GD, Momentum, AdaGrad, RMSProp, Adam
  • Normalization: Layer Norm, RMS Norm
  • Regularization: L1 (Lasso) & L2 (Ridge)
  • Metrics:
    • Regression: R2, Adjusted R2, MSE, MAE, RMSE
    • Classification: Accuracy, Precision, Recall, F1 Score, Confusion Matrix, ROC Curve, AUC

NLP / Transformer Components

  • Tokenizer: Word, Character, BPE (Byte Pair Encoding), and Sentence tokenization
  • Embedding Layer: Trainable word embeddings
  • Multi-Head Attention: Scaled dot-product attention with KV cache for efficient inference
  • Positional Encoding: Sinusoidal and Rotary (RoPE)
  • Transformer Blocks: Pre-norm architecture with residual connections

Precision Support

All core classes are templated on scalar type (template<typename T = double>), enabling both float (f32) and double (f64) precision:

  • Matrix<float> / MatrixF32 for memory-efficient inference
  • Matrix<double> / MatrixF64 for training precision (default)
  • Classical ML models default to double; the transformer stack supports both

Language Bindings

  • Python bindings via pybind11 with NumPy array support (both float32 and float64)

Prerequisites

  • C++17 or higher
  • CMake 3.16+
  • A C++ compiler (GCC, Clang, or MSVC)

Building

Linux/macOS

# Clone the repository
git clone https://github.com/ProdigiousPersonn/ML-Models
cd ML-Models
# Create and enter build directory
mkdir build && cd build
# Configure with CMake
cmake ..
# Build the project
cmake --build .
# Run the executable
./Build

Windows

# Clone the repository
git clone https://github.com/ProdigiousPersonn/ML-Models
cd ML-Models
# Create and enter build directory
mkdir build
cd build
# Configure with CMake
cmake ..
# Build the project
cmake --build . --config Release
# Run the executable
.\Release\Build.exe

Project Structure

LinearModel/
├── source/
│ ├── main.cpp # Entry point
│ ├── math/ # Matrix operations
│ ├── core/ # Loss, optimizer, regularizer, metrics, tokenizer, embedding
│ ├── models/ # ML model implementations
│ └── utils/ # CSV utilities
├── include/ml_lib/ # Public headers
├── examples/
│ ├── c++/ # C++ examples
│ │ ├── linear-regression/housing/
│ │ └── language-model/
│ ├── logistic-regression/ # Heart disease classification example
│ ├── python/ # Python examples
│ └── datasets/ # Example datasets
├── python/ # Python bindings (pybind11)
├── tests/ # Unit tests (doctest)
├── external/ # Dependencies (fmt, spdlog, doctest)
├── csv-parser/ # CSV parsing library
├── pybind11/ # Python bindings library
└── CMakeLists.txt # Build configuration

Examples

Housing Price Prediction (Linear Regression)

A complete example demonstrating linear regression on a real-world housing dataset (https://www.kaggle.com/datasets/yasserh/housing-prices-dataset):

  • Dataset: 545 housing samples with 12 features (area, bedrooms, bathrooms, etc.)
  • Features: Z-score normalization
  • Model: Linear regression with L2 regularization
  • Optimizer: Batch gradient descent
  • Metrics: MSE, RMSE, MAE, R2

Llama 3.2-1B Instruct (Language Model)

A text generation example using Llama 3.2-1B Instruct loaded from a GGUF file:

  • Model: Llama 3.2-1B Instruct (GGUF format)
  • Supported quantizations: F32, F16, Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, Q8_1
  • Features: Tokenizer encoding/decoding, streaming output, temperature and top-p sampling
  • Inference: Autoregressive generation with KV cache

Downloading the model:

Model weights are not included in the repository. Download a GGUF file from Hugging Face:

# Install the Hugging Face CLI
pip install huggingface-hub
# Q8_0 quantized (~1.1 GB)
huggingface-cli download bartowski/Llama-3.2-1B-Instruct-GGUF Llama-3.2-1B-Instruct-Q8_0.gguf --local-dir examples/datasets/language-model/

Heart Disease Prediction (Logistic Regression)

A binary classification example using logistic regression on the Framingham Heart Study dataset (https://www.kaggle.com/datasets/dileep070/heart-disease-prediction-using-logistic-regression):

  • Dataset: Framingham Heart Study - 10 Year CHD Risk
  • Features: 15 clinical features (age, sex, cholesterol, blood pressure, BMI, etc.)
  • Preprocessing: Z-score normalization
  • Model: Logistic regression with L2 regularization
  • Loss: Binary Cross-Entropy (BCE)
  • Optimizer: Batch gradient descent
  • Metrics: Accuracy, Precision, Recall, F1 Score, Confusion Matrix, ROC Curve, AUC

Run the examples:

./Build

Roadmap

Regression [X]

  • Linear Regression
  • Evaluation Metrics (Regression): MSE, MAE, RMSE, R-squared
  • Regularization: L1 (Lasso) & L2 (Ridge)

Classification [X]

  • Logistic Regression
  • Evaluation Metrics (Classification):
    • Accuracy, Precision, Recall, FPR, F1-Score
    • Confusion Matrix
    • ROC Curve and AUC
  • K-Nearest Neighbors (KNN)
  • Support Vector Machines (SVMs)

Tree-Based Models [X]

  • Decision Trees
  • Random Forests

Unsupervised Learning [X]

  • K-Means Clustering

Deep Learning [In Progress]

  • Neural Networks (Feedforward)
  • Backpropagation
  • Activation Functions: ReLU, Sigmoid, Tanh, Linear, Softplus, Softmax, Step, Sign
  • Optimizers:
    • Mini-Batch Gradient Descent
    • Adam Optimizer
    • RMSProp
    • AdaGrad
    • Momentum SGD
  • Model Serialization
  • Batch Normalization
  • Layer Normalization
  • RMS Normalization
  • Dropout Regularization

NLP / Transformers [In Progress]

  • Tokenizer: Word, Character, BPE, Sentence
  • Embedding Layer
  • Attention Mechanisms: Multi-head self-attention with KV cache
  • Positional Encoding: Sinusoidal, Rotary (RoPE)
  • Transformer Blocks: Pre-norm with residual connections
  • Transformer Model: Autoregressive generation with token sampling
  • Language Models (GGUF loading / Llama inference)

Precision [X]

  • f64 (double): Default precision for all operations
  • f32 (float): Template support across the full stack
  • f16 / Quantization: F16, Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, Q8_1 dequantization for GGUF loading

DL Architectures [ ]

  • Convolutional Neural Networks (CNNs) (For images)
  • Recurrent Neural Networks (RNNs) (For sequences)

About

A C++/Python machine learning library built from scratch. Features classic ML algorithms and a GGUF-compatible inference loader for transformers.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

AltStyle によって変換されたページ (->オリジナル) /