Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

CaptainChicky/Shitty-Python-Neural-Net

Repository files navigation

Shitty-Python-Neural-Net

read the title lmao

more concretely, this is a framework for building multilayer perceptrons

I don't plan to update this in the forseeable future. Pull requests/issues welcome (assuming anyone even comes across this repo lmao).

Todo

  • Make the someActivationFunction.derivative thing work instead of manually setting it in layers
    • now automatically detects and sets derivatives
  • Instead of MSE cost, use cross entropy
    • added cost_function parameter with MSE, MAE, and bin/cat CE options
  • Perhaps instead of tanh as the output layer activation function, use softmax, or maybe even sigmoid
    • output activation is no longer hardcoded, defaults to tanh but can be overridden with softmax, sigmoid, or any activation
  • Allow the training to choose a certain subset of the total data to train with for a single epoch
    • added samples_per_epoch parameter to randomly sample a subset each epoch
  • Optimize double forward propagation in training
    • backprop now returns both gradients and predictions so during training, backprop doesn't have to run twice
  • Allow custom alpha values for leaky relu (currently hardcoded to 0.01)
    • now supports activation_params dict for all parametric activations, and weight_init_params/bias_init_params for initializers
  • Allow setting a random seed for reproducibility when initing a network (extremely low priority)
  • Adding noise to training data when processing (like MNIST) if the data is regularized, to be able to train more robust networks (extremely low priority)
  • Add optimization so the network doesn't train too slowly on large datasets, like mini-batch training? (meh priority)

Known Issues

If you set...

  • the learning rate too high (>0.001)
  • too high of a clipping barrier (haven't tested but its a given because when I didn't clip, it just killed itself)
  • or initialize layer weights to be too large (I initially did it with a normal distr mean0 and std1, but I had to lower the std)

the network will diverge due to vanishing or exploding gradients (see /docs/GRADIENT_ANALYSIS.md for additional details). This apparently is a common issue with neural networks, and is usually solved by clipping the network or by using a lower learning rate.

He/Xavier initialization requires normalized data! If you use weight_init='he' or weight_init='xavier' with unnormalized data (e.g., raw RGB 0-255), the large inputs ×ばつ large weights = exploding activations and training fails. Either normalize your data first (divide RGB by 255.0 to get 0-1 range), or use a weight init like weight_init='normal', weight_init_params={'std': 0.01}.

If the overall dataset size is small, or if the data is regularized/preprocessed a certain way (like MNIST cough cough), the network may overfit. Consider adding noise/scaling/fucking around with the data, and using samples_per_epoch to train on a random subset each epoch for regularization. Or, if network trains too slowly per epoch, just use sampling as well.

The network initializes weights and biases randomly, so training results may vary between runs. So I guess you can just initialize multiple times and pick the best one, then continue training from there. You could also set a random seed for reproducibility if desired (I haven't implemented, but you can set np.random.seed(your_seed) at the start of your script).

Notes

Numpy is the only dependency. The code is written noobishly so optimization would be nice.

An epoch is one iteration through the entire training set.

See /docs/ for some more specific and supplementary documentation.

Example models that showcase all features are made in main_create_and_train.py. Currently, I have 7 networks that showcase all features, each trained on different problems:

  • RGB color classification (is this color "red" or "not red"?)
  • XOR (+normalized noise) problem
  • Sine wave approximation
  • Checkerboard pattern classification
  • Quadrant classification (which quadrant does this 2D point belong to?)
  • Iris flower classification
  • Linear regression (simple y=mx+b fitting, which apparently is a thing)

Some of these models are good, others are pretty bad, but they should serve as decent and comprehensive examples of how to use the code.

More optimized models I've made are in main_optimized_mlps.py, that have better architectures overall (since they're not showcasing features). I have 8 of them, each trained on different problems:

  • RGB color classification (is this color "red" or "not red"?)
  • XOR (+normalized noise) problem
  • Sine wave approximation
  • Checkerboard pattern classification
  • Quadrant classification (which quadrant does this 2D point belong to?)
  • Iris flower classification
  • Linear regression (simple y=mx+b fitting, which apparently is a thing)
  • MNIST digit classification

The MNIST model in particular is a standard example of MLP usage, and this one actually trains decently well in my first attempt with ~94% accuracy after 500 epochs of initial training, and ~96% accuracy after 200 epochs of fine-tuning.

The main_create_and_train.py, main_load.py, and the visualize scripts only reflect my most recent edits/training attempts, so be sure to check them and modify so you correctly train/visualize stuff you want.

Training everything lowkey is kinda finicky, you might need to restart multiple times to get a good initialization that trains well. Further refining of the network may need specific data generation. For example, if you want to refine the boundary of the sine categorization problem, you can generate more data that is clustered around the boundary en masse, and train with that, which would force the network to try to improve its boundary performance.

MNIST data json files are too bulky and annoying, so I've excluded them from the repo in the gitignore. You can generate them yourself using utility_mnist_processing.py.

Usage

Run the data generation scripts to get data.

Run the main scripts lmao and change them as you'd like to make your own MLP neural net.

Fluttershy MLP

Basic Training

# Train on all data each epoch (default behavior)
training.train(input_data, target_data, epochs=500)

Subset Training

Apparently subset training adds regularization through data sampling so it overfits less on small datasets. The cost would jump around tho compared to basic training.

# Train on 400 randomly selected samples per epoch (out of some total)
training.train(input_data, target_data, epochs=500, samples_per_epoch=400)

Automatic Best Model Checkpointing

To prevent loss of the best model due to overfitting or cost spikes, enable automatic checkpointing. The best model (lowest cost) is saved to a specified file whenever a new best cost is achieved during training. However, I guess you can argue grokking might be prevented by this, but idk if small models can even "grok" lol (also weight decay would need to be implemented :p).

# The best model (lowest cost) is automatically saved during training
training = Training(neural_net,
 learning_rate=0.001,
 clip_value=5,
 cost_function='mse',
 checkpoint_path='models/model_best.json') # Auto-saves best here
training.train(input_data, target_data, epochs=5000)

If you disable checkpointing, you have to manually save your model at the end of training.

training = Training(neural_net,
 learning_rate=0.001,
 clip_value=5,
 cost_function='mse')
 # No checkpoint_path specified
# Train the model
training.train(input_data, target_data, epochs=5000)
# Manually save the final model after training
neural_net.save('models/model_final.json')

Please refer to the /docs/ folder for more detailed documentation

  1. Activation Functions Documentation
  2. Cost Functions Documentation
  3. Gradient Analysis Documentation
  4. Training Guide Documentation

Make sure to also check out /visualization/ :3

here's some images of my trained mlps Checkerboard Visualization House Price Visualization MNIST Visualization Quadrant Visualization Red Color Visualization Sine Wave Visualization XOR Visualization

here's html files that let you play with the trained mlps interactively
download the raw file and open it in your browser to see :3

And lastly, there's an interactive MNIST classifier. You have to manually upload the model_mnist.json, and make sure you have the 4 font files in ./visualization/fonts/ to render the page properly, but you're able to draw on a canvas and see what the trained model predicts. Beware that since MNIST data is regularized (centered, size nomalized, etc), a model that trains on purely the database alone will overfit into that specific regularization, and probably won't entirely be up to expectations on accuracy when testing on your own handwritten digits. You can do some finicky things like adding noise, uncentering, scaling, &c. the dataset entries to make a more robust network, but I'm too lazy to do that (also training takes forever lol)

AltStyle によって変換されたページ (->オリジナル) /