Attempt to read a binary file in python. From the dataset page:
The pixels are stored as unsigned chars (1 byte) and take values from 0 to 255
I have tried the following, which prints (0,), rather than a 784,000 digit array.
# -*- coding: utf8 -*-
# Processed MNIST dataset (http://cis.jhu.edu/~sachin/digit/digit.html)
import struct
f = open('data/data0', mode='rb')
data = []
print struct.unpack('<i', f.read(4))
How can I read this binary into either a 784,000 digit array (28 bytes x 28 bytes x 1k samples), or a 28x28x1000 3D array. I have never worked with binaries before, and am quite confused!
-
1Just food for thought, some existing work using Python read MNIST digits image. github.com/sorki/python-mnist/blob/master/mnist/loader.pyB.Mr.W.– B.Mr.W.2017年01月06日 03:51:41 +00:00Commented Jan 6, 2017 at 3:51
2 Answers 2
f.read() will get you an immutable array of 784,000 bytes (called a str in Python 2). If you need it to be mutable, you can use the array module and its array type capable of storing various primitives, unsigned bytes (represented by the B code) included:
from array import array
data = array('B')
with open('data/data0', 'rb') as f:
data.fromfile(f, 784000)
This can be sliced as necessary:
EXAMPLE_SIZE = 24 * 24
examples = [data[s:s + EXAMPLE_SIZE] for s in xrange(0, len(a), EXAMPLE_SIZE)]
Comments
Using NumPy
import numpy as np
from pathlib import Path
data = np.frombuffer(Path('data/data0').read_bytes(), dtype=np.uint8)
images = data.reshape(1000, 28, 28)
first_image = images[0] # shape: (28, 28)
np.frombuffer() interprets raw bytes as a NumPy array with the specified dtype. Since the pixels are unsigned chars (1 byte, 0-255), use dtype=np.uint8.
Comments
Explore related questions
See similar questions with these tags.