NexNet is a neural network framework implemented from scratch using NumPy. It provides functionalities similar to PyTorch and TensorFlow, including various activation functions, loss functions, optimizers, and more. This README will guide you through the setup, usage, and features of the framework.
FNN: Feedforward Neural Network for classification and regression.Sequential: PyTorch-like sequential container for building models layer by layer.CNN: Convolutional Neural Network model for image tasks.RNNModel: Recurrent Neural Network model for sequence tasks.Transformer: GPT-style transformer model for language modeling.
ReLU: Rectified Linear Unit, introduces non-linearity by zeroing out negative values.Softmax: Converts logits to probabilities, commonly used in the output layer for classification tasks.PReLU: Parametric ReLU, allows for a learnable slope for negative values.Sigmoid: Maps values to a range between 0 and 1, often used in binary classification.Tanh: Maps values to a range between -1 and 1, helping with centering data.LeakyReLU: Similar to ReLU but allows a small gradient when inputs are negative.ELU: Exponential Linear Unit, helps speed up learning by smoothing the activation function.Swish: Smooth, non-monotonic activation function that can improve model performance.Softplus: A smooth approximation to ReLU, improving gradient flow.GELU: Gaussian Error Linear Unit, used in GPT/BERT transformer models.
CrossEntropyLoss: For multi-class classification with built-in softmax.BinaryCrossEntropyLoss: For binary classification tasks.MSE: Mean Squared Error for regression tasks.MAE: Mean Absolute Error for regression tasks.HuberLoss: Combines MSE and MAE advantages, robust to outliers.PoissonLoss: For count-based prediction tasks.CosineSimilarityLoss: Measures angular distance between vectors.
SGD: Stochastic Gradient Descent.Momentum: SGD with momentum for faster convergence.AdaGrad: Adaptive learning rates based on gradient history.RMSProp: Adaptive learning rates with moving average.AdaDelta: Extension of AdaGrad with reduced learning rate decay.Adam: Adaptive moment estimation, combines AdaGrad and RMSProp.AdamW: Adam with decoupled weight decay regularization.NAdam: Adam with Nesterov momentum.
Linear: Fully connected layer with optional activation function.
Conv2D: 2D convolutional layer with configurable kernel, stride, and padding.MaxPool2D: Max pooling layer for spatial downsampling.AvgPool2D: Average pooling layer for smooth downsampling.
RNN: Vanilla recurrent neural network for sequence processing.LSTM: Long Short-Term Memory for learning long-term dependencies.GRU: Gated Recurrent Unit, a simpler alternative to LSTM.Embedding: Converts integer indices to dense vectors for NLP tasks.
MultiHeadAttention: Multi-head self-attention mechanism.ScaledDotProductAttention: Core attention operation with masking support.TransformerDecoderBlock: GPT-style decoder block with causal masking.TransformerEncoderBlock: BERT-style encoder block.FeedForward: Position-wise feed-forward network.LayerNorm: Layer normalization (different from BatchNorm).SinusoidalPositionalEncoding: Fixed positional encoding from "Attention Is All You Need".LearnedPositionalEncoding: Learnable position embeddings (GPT/BERT style).
Dropout: Regularization layer that randomly drops units during training.BatchNorm: Batch normalization for faster and more stable training.Flatten: Reshapes input to 2D for transition to fully connected layers.
L1Regularization: Lasso regularization for sparse weights.L2Regularization: Ridge regularization for small weights.ElasticNetRegularization: Combination of L1 and L2.WeightDecay: Direct weight decay during optimization.MaxNormConstraint: Clip weights by max norm.UnitNormConstraint: Normalize weights to unit norm.
clip_grad_norm: Clip gradient norm to prevent exploding gradients.clip_grad_value: Clip gradient values to a range.
Xavier: For sigmoid and tanh activations.He: For ReLU-based activations.Random: Simple random initialization.Zero: Zero initialization.
StepLR: Decay learning rate by factor every N epochs.ExponentialLR: Exponential decay every epoch.CosineAnnealingLR: Cosine annealing schedule.ReduceLROnPlateau: Reduce LR when metric stops improving.
EarlyStopping: Stop training when metric stops improving.ModelCheckpoint: Save model when metric improves.History: Record and plot training history.
accuracy: Classification accuracy.precision: Precision score with averaging options.recall: Recall score with averaging options.f1_score: F1 score (harmonic mean of precision and recall).confusion_matrix: Confusion matrix for classification.mean_squared_error: MSE metric for regression.mean_absolute_error: MAE metric for regression.r2_score: R-squared coefficient of determination.
DataLoader: Batch data loading with shuffling support.train_test_split: Split data into train and test sets.OneHotEncoder: Encode/decode one-hot vectors.
Module: Base class for all modules (similar to nn.Module).Parameter: Wrapper for trainable parameters.init_weights: Weight initialization utility function.
Clone the repository:
git clone https://github.com/chiruu12/NexNet.git
cd NexNet
pip install -r requirements.txtfrom Models import FNN, Sequential, CNN, RNNModel, Transformer from Losses import CrossEntropyLoss, MSE, MAE, HuberLoss from Layers import Linear, Dropout, BatchNorm, Flatten, Conv2D, MaxPool2D, RNN, LSTM, GRU, Embedding from Activation_classes import ReLu, Softmax, PReLU, Sigmoid, Tanh, LeakyReLu, ELU, Swish, Softplus, GELU from utils import OneHotEncoder, Initializer, clip_grad_norm, L1Regularization, L2Regularization from Optimizer import SGD, Momentum, AdaGrad, Adam, AdamW, NAdam, RMSProp, AdaDelta from schedulers import StepLR, CosineAnnealingLR, ReduceLROnPlateau from callbacks import EarlyStopping, ModelCheckpoint, History from metrics import accuracy, precision, recall, f1_score, confusion_matrix from data import DataLoader, train_test_split from core import Module, Parameter, init_weights
from data import train_test_split from utils import OneHotEncoder X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) encoder = OneHotEncoder(num_classes=10) y_train_encoded = encoder.encode(y_train) y_test_encoded = encoder.encode(y_test)
from Models import FNN from Layers import Linear, Dropout, BatchNorm from Activation_classes import ReLu from Losses import CrossEntropyLoss from Optimizer import Adam model = FNN(loss=CrossEntropyLoss(), optimizer=Adam(learning_rate=0.001)) model.add_layer(Linear(input_dim=784, output_dim=256, activation=ReLu())) model.add_layer(BatchNorm(256)) model.add_layer(Dropout(rate=0.3)) model.add_layer(Linear(input_dim=256, output_dim=128, activation=ReLu())) model.add_layer(Dropout(rate=0.3)) model.add_layer(Linear(input_dim=128, output_dim=10)) model.summary() history = model.train( X_train, y_train_encoded, epochs=30, batch_size=64, validation_split=0.1, verbose=True ) loss, accuracy = model.evaluate(X_test, y_test_encoded)
from Models import Sequential from Layers import Linear, Dropout, BatchNorm from Activation_classes import ReLu, Softmax from Losses import CrossEntropyLoss from Optimizer import Adam # Build model using Sequential container model = Sequential( Linear(784, 256), ReLu(), BatchNorm(256), Dropout(rate=0.3), Linear(256, 128), ReLu(), Dropout(rate=0.3), Linear(128, 10), Softmax() ) # Compile and train model.compile(optimizer=Adam(learning_rate=0.001), loss=CrossEntropyLoss()) history = model.fit(X_train, y_train, epochs=30, batch_size=64, validation_data=(X_val, y_val)) # Or build incrementally model = Sequential() model.add(Linear(784, 256)) model.add(ReLu()) model.add(Linear(256, 10))
from Models import CNN from Layers import Conv2D, MaxPool2D, Flatten, Linear, Dropout from Activation_classes import ReLu, Softmax from Losses import CrossEntropyLoss from Optimizer import Adam # Build CNN for MNIST model = CNN([ Conv2D(in_channels=1, out_channels=32, kernel_size=3, padding=1), ReLu(), MaxPool2D(pool_size=2, stride=2), Conv2D(in_channels=32, out_channels=64, kernel_size=3, padding=1), ReLu(), MaxPool2D(pool_size=2, stride=2), Flatten(), Linear(64 * 7 * 7, 128), ReLu(), Dropout(rate=0.5), Linear(128, 10), Softmax() ]) model.compile(optimizer=Adam(learning_rate=0.001), loss=CrossEntropyLoss()) history = model.fit(X_train, y_train, epochs=10, batch_size=32, clip_grad_norm=1.0)
from Models import RNNModel from Layers import Embedding, LSTM, Linear from Activation_classes import Softmax from Losses import CrossEntropyLoss from Optimizer import Adam # Build RNN for text classification model = RNNModel([ Embedding(vocab_size=10000, embed_dim=128), LSTM(input_size=128, hidden_size=256, return_sequences=False), Linear(256, 64), ReLu(), Linear(64, num_classes), Softmax() ]) model.compile(optimizer=Adam(learning_rate=0.001), loss=CrossEntropyLoss()) history = model.fit(X_train, y_train, epochs=10, batch_size=32, clip_grad_norm=1.0) # Generate sequences generated = model.generate(start_tokens, max_length=100, temperature=0.8)
from Models import Transformer from Losses import CrossEntropyLoss from Optimizer import AdamW # Build GPT-style transformer model = Transformer( vocab_size=50000, d_model=256, n_heads=8, n_layers=6, d_ff=1024, max_seq_len=512, dropout=0.1, causal=True ) model.compile(optimizer=AdamW(learning_rate=1e-4, weight_decay=0.01), loss=CrossEntropyLoss()) model.summary() # Train on language modeling task history = model.fit(X_train, y_train, epochs=10, batch_size=16, clip_grad_norm=1.0) # Generate text generated = model.generate( start_tokens=start_ids, max_length=100, temperature=0.8, top_k=40, top_p=0.9 )
from utils import clip_grad_norm, clip_grad_value # Clip during training history = model.fit(X_train, y_train, epochs=10, clip_grad_norm=1.0) # Or manually output = model.forward(X_batch) loss = loss_fn.forward(output, y_batch) grad = loss_fn.backward() model.backward(grad) # Clip gradients before optimizer step total_norm = clip_grad_norm(model.layers, max_norm=1.0) clip_grad_value(model.layers, clip_value=0.5) optimizer.step(model.layers)
from utils import L1Regularization, L2Regularization, MaxNormConstraint l2_reg = L2Regularization(lambda_reg=0.001) max_norm = MaxNormConstraint(max_norm=3.0) for epoch in range(epochs): # Forward and backward pass output = model.forward(X_batch) loss = loss_fn.forward(output, y_batch) # Add regularization loss reg_loss = l2_reg.loss(model.layers) total_loss = loss + reg_loss grad = loss_fn.backward() model.backward(grad) # Apply regularization gradients l2_reg.apply_gradients(model.layers) optimizer.step(model.layers) # Apply weight constraints max_norm.apply(model.layers)
from callbacks import EarlyStopping, ModelCheckpoint from schedulers import ReduceLROnPlateau early_stop = EarlyStopping(patience=5, mode='min') checkpoint = ModelCheckpoint('best_model.npz', monitor='val_loss') scheduler = ReduceLROnPlateau(optimizer, patience=3, factor=0.5)
model.save('model_weights.npz') new_model = FNN(loss=CrossEntropyLoss(), optimizer=Adam()) new_model.load('model_weights.npz')
NexNet/
βββ Activation_classes/ # Activation functions
βββ Layers/ # Neural network layers
βββ Losses/ # Loss functions
βββ Models/ # Model architectures (FNN, Sequential, CNN, RNN, Transformer)
βββ Optimizer/ # Optimization algorithms
βββ callbacks/ # Training callbacks
βββ core/ # Base classes (Module, Parameter)
βββ data/ # Data utilities
βββ metrics/ # Evaluation metrics
βββ schedulers/ # Learning rate schedulers
βββ utils/ # Utilities (initializers, regularization, grad_clip)
βββ implementation/ # Example implementations
βββ NLP/ # NLP implementations
βββ requirements.txt
For NLP-related implementations (Word2Vec, GloVe, NER), see Readme_NLP.md.
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Commit changes:
git commit -m "Add your feature" - Push to branch:
git push origin feature/your-feature - Open a Pull Request
NexNet is licensed under the MIT License.