AidenTran900/ml-library-cpp

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
.github/workflows		.github/workflows
.vscode		.vscode
cmake		cmake
csv-parser @ d18c788		csv-parser @ d18c788
examples		examples
external		external
include		include
pybind11 @ 6c83607		pybind11 @ 6c83607
python		python
source		source
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
CTestConfig.cmake		CTestConfig.cmake
CreateBoilerPlate.sh		CreateBoilerPlate.sh
LICENSE		LICENSE
README.md		README.md
appveyor.yml		appveyor.yml

Repository files navigation

ML Models

A C++ machine learning library built from the ground up implementing various ML algorithms and models.

Features

Models

Linear Regression with gradient descent optimization
Logistic Regression for binary classification
K-Nearest Neighbors (KNN) with Euclidean and Manhattan distance metrics
Support Vector Machines (SVM) with multiple kernels (Linear, Polynomial, RBF, Sigmoid)
Decision Trees with Gini and Entropy impurity measures
Random Forests with bootstrap aggregation
K-Means Clustering for unsupervised learning
Neural Networks with backpropagation and configurable layers
Residual Networks (ResNet) with skip connections
Transformer with multi-head self-attention, KV cache, and autoregressive generation
Perceptron for binary classification

Core Components

Matrix Operations: Addition, multiplication, transpose, inverse, Hadamard product, determinant — templated for float and double (Matrix<float> / Matrix<double>)
Activation Functions: ReLU, Sigmoid, Tanh, Linear, Softplus, Softmax, Step, Sign
Loss Functions: MSE, MAE, RMSE, Binary Cross-Entropy, Categorical Cross-Entropy
Optimizers: SGD, Mini-Batch GD, Momentum, AdaGrad, RMSProp, Adam
Normalization: Layer Norm, RMS Norm
Regularization: L1 (Lasso) & L2 (Ridge)
Metrics:
- Regression: R2, Adjusted R2, MSE, MAE, RMSE
- Classification: Accuracy, Precision, Recall, F1 Score, Confusion Matrix, ROC Curve, AUC

NLP / Transformer Components

Tokenizer: Word, Character, BPE (Byte Pair Encoding), and Sentence tokenization
Embedding Layer: Trainable word embeddings
Multi-Head Attention: Scaled dot-product attention with KV cache for efficient inference
Positional Encoding: Sinusoidal and Rotary (RoPE)
Transformer Blocks: Pre-norm architecture with residual connections

Precision Support

All core classes are templated on scalar type (template<typename T = double>), enabling both float (f32) and double (f64) precision:

Matrix<float> / MatrixF32 for memory-efficient inference
Matrix<double> / MatrixF64 for training precision (default)
Classical ML models default to double; the transformer stack supports both

Language Bindings

Python bindings via pybind11 with NumPy array support (both float32 and float64)

Prerequisites

C++17 or higher
CMake 3.16+
A C++ compiler (GCC, Clang, or MSVC)

Building

Linux/macOS

# Clone the repository
git clone https://github.com/ProdigiousPersonn/ML-Models
cd ML-Models
# Create and enter build directory
mkdir build && cd build
# Configure with CMake
cmake ..
# Build the project
cmake --build .
# Run the executable
./Build

Windows

# Clone the repository
git clone https://github.com/ProdigiousPersonn/ML-Models
cd ML-Models
# Create and enter build directory
mkdir build
cd build
# Configure with CMake
cmake ..
# Build the project
cmake --build . --config Release
# Run the executable
.\Release\Build.exe

Project Structure

LinearModel/
├── source/
│ ├── main.cpp # Entry point
│ ├── math/ # Matrix operations
│ ├── core/ # Loss, optimizer, regularizer, metrics, tokenizer, embedding
│ ├── models/ # ML model implementations
│ └── utils/ # CSV utilities
├── include/ml_lib/ # Public headers
├── examples/
│ ├── c++/ # C++ examples
│ │ ├── linear-regression/housing/
│ │ └── language-model/
│ ├── logistic-regression/ # Heart disease classification example
│ ├── python/ # Python examples
│ └── datasets/ # Example datasets
├── python/ # Python bindings (pybind11)
├── tests/ # Unit tests (doctest)
├── external/ # Dependencies (fmt, spdlog, doctest)
├── csv-parser/ # CSV parsing library
├── pybind11/ # Python bindings library
└── CMakeLists.txt # Build configuration

Examples

Housing Price Prediction (Linear Regression)

A complete example demonstrating linear regression on a real-world housing dataset (https://www.kaggle.com/datasets/yasserh/housing-prices-dataset):

Dataset: 545 housing samples with 12 features (area, bedrooms, bathrooms, etc.)
Features: Z-score normalization
Model: Linear regression with L2 regularization
Optimizer: Batch gradient descent
Metrics: MSE, RMSE, MAE, R2

Llama 3.2-1B Instruct (Language Model)

A text generation example using Llama 3.2-1B Instruct loaded from a GGUF file:

Model: Llama 3.2-1B Instruct (GGUF format)
Supported quantizations: F32, F16, Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, Q8_1
Features: Tokenizer encoding/decoding, streaming output, temperature and top-p sampling
Inference: Autoregressive generation with KV cache

Downloading the model:

Model weights are not included in the repository. Download a GGUF file from Hugging Face:

# Install the Hugging Face CLI
pip install huggingface-hub
# Q8_0 quantized (~1.1 GB)
huggingface-cli download bartowski/Llama-3.2-1B-Instruct-GGUF Llama-3.2-1B-Instruct-Q8_0.gguf --local-dir examples/datasets/language-model/

Heart Disease Prediction (Logistic Regression)

A binary classification example using logistic regression on the Framingham Heart Study dataset (https://www.kaggle.com/datasets/dileep070/heart-disease-prediction-using-logistic-regression):

Dataset: Framingham Heart Study - 10 Year CHD Risk
Features: 15 clinical features (age, sex, cholesterol, blood pressure, BMI, etc.)
Preprocessing: Z-score normalization
Model: Logistic regression with L2 regularization
Loss: Binary Cross-Entropy (BCE)
Optimizer: Batch gradient descent
Metrics: Accuracy, Precision, Recall, F1 Score, Confusion Matrix, ROC Curve, AUC

Run the examples:

./Build

Roadmap

Regression [X]

Linear Regression
Evaluation Metrics (Regression): MSE, MAE, RMSE, R-squared
Regularization: L1 (Lasso) & L2 (Ridge)

Classification [X]

Logistic Regression
Evaluation Metrics (Classification):
- Accuracy, Precision, Recall, FPR, F1-Score
- Confusion Matrix
- ROC Curve and AUC
K-Nearest Neighbors (KNN)
Support Vector Machines (SVMs)

Tree-Based Models [X]

Decision Trees
Random Forests

Unsupervised Learning [X]

K-Means Clustering

Deep Learning [In Progress]

Neural Networks (Feedforward)
Backpropagation
Activation Functions: ReLU, Sigmoid, Tanh, Linear, Softplus, Softmax, Step, Sign
Optimizers:
- Mini-Batch Gradient Descent
- Adam Optimizer
- RMSProp
- AdaGrad
- Momentum SGD
Model Serialization
Batch Normalization
Layer Normalization
RMS Normalization
Dropout Regularization

NLP / Transformers [In Progress]

Tokenizer: Word, Character, BPE, Sentence
Embedding Layer
Attention Mechanisms: Multi-head self-attention with KV cache
Positional Encoding: Sinusoidal, Rotary (RoPE)
Transformer Blocks: Pre-norm with residual connections
Transformer Model: Autoregressive generation with token sampling
Language Models (GGUF loading / Llama inference)

Precision [X]

f64 (double): Default precision for all operations
f32 (float): Template support across the full stack
f16 / Quantization: F16, Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, Q8_1 dequantization for GGUF loading

DL Architectures [ ]

Convolutional Neural Networks (CNNs) (For images)
Recurrent Neural Networks (RNNs) (For sequences)

About

A C++/Python machine learning library built from scratch. Features classic ML algorithms and a GGUF-compatible inference loader for transformers.

Releases

No releases published

Folders and files

Latest commit

History

Repository files navigation

ML Models

Features

Models

Core Components

NLP / Transformer Components

Precision Support

Language Bindings

Prerequisites

Building

Linux/macOS

Windows

Project Structure

Examples

Housing Price Prediction (Linear Regression)

Llama 3.2-1B Instruct (Language Model)

Heart Disease Prediction (Logistic Regression)

Roadmap

Regression [X]

Classification [X]

Tree-Based Models [X]

Unsupervised Learning [X]

Deep Learning [In Progress]

NLP / Transformers [In Progress]

Precision [X]

DL Architectures [ ]

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages