Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Uses an N-tuple Classifier to classify handwritten images of numbers (0-9) using the MNIST data set.

License

Notifications You must be signed in to change notification settings

kupad/ntuple-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

12 Commits

Repository files navigation

N-tuple Classifier

Use the N-tuple Classifier to classify handwritten images of numbers (0-9) using the MNIST handwritten digit data. The program will train the classifier using the training data, test the performance using the test data, and then summarize performance with a confusion matrix and will print out the accuracy.

Requirements

To run, you will need to install the numpy and numba packages. This is for performance.

You'll also need the MNIST data, a copy of which I've checked into this repository.

To Execute

The main program is mnist_ntuple.py

$ python3 mnist_ntuple.py data/train-images-idx3-ubyte data/train-labels-idx1-ubyte data/t10k-images-idx3-ubyte data/t10k-labels-idx1-ubyte 

Description

The N-tuple classifier is pretty neat. It's relatively simple, potentially quite fast, and even with a naive implementation has an accuracy of around 91% on the MNIST data. But it isn't very popular, and is not typically discussed in introductory data science material.

The N-tuple classifier works by using subclassifiers (called modules). When training: each module will sample a small number of pixel positions from each image, determining which of the pixels are "active" (have a value above a threshold). A table is used to keep track of the number of times each label was encountered for the active sampled pixels.

When classifying an image: for each module, we sample the same pixel positions, and we sum up the number of times we've seen each label given the active sample pixels. The label with the highest count will be the label we choose.

For more information on n-tuple classifiers:

USAGE:

$ python3 mnist_ntuple.py -h
usage: mnist_ntuple.py [-h] train_images_file train_labels_file test_images_file test_labels_file
positional arguments:
 train_images_file file with the training data
 train_labels_file file with the training labels
 test_images_file file with the testing data
 test_labels_file file with the testing labels
optional arguments:
 -h, --help show this help message and exit

Here's an example run:

$ python3 mnist_ntuple.py data/train-images-idx3-ubyte data/train-labels-idx1-ubyte data/t10k-images-idx3-ubyte data/t10k-labels-idx1-ubyte 
Confusion Matrix:
 0 1 2 3 4 5 6 7 8 9
 0 920 0 6 3 0 41 6 0 3 1
 1 0 1080 4 7 1 14 2 1 24 2
 2 7 3 894 28 10 15 10 12 46 7
 3 0 0 5 921 1 47 0 8 15 13
 4 1 0 7 0 844 6 4 1 13 106
 5 2 2 1 31 4 822 3 2 10 15
 6 8 2 10 0 19 63 845 3 7 1
 7 0 2 22 5 2 1 0 916 19 61
 8 0 0 5 30 5 34 0 3 873 24
 9 4 2 4 15 12 5 1 8 13 945
accuracy: 0.9060

About

Uses an N-tuple Classifier to classify handwritten images of numbers (0-9) using the MNIST data set.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

AltStyle によって変換されたページ (->オリジナル) /