Firstly, before this question gets marked as duplicate, I'm aware others have asked similar questions but there doesn't seem to be a clear explanation. I'm trying to read in a binary file into an 2D array (documented well here http://nsidc.org/data/docs/daac/nsidc0051_gsfc_seaice.gd.html).
The header is a 300 byte array.
So far, I have;
import struct
with open("nt_197912_n07_v1.1_n.bin",mode='rb') as file:
filecontent = file.read()
x = struct.unpack("iiii",filecontent[:300])
Throws up an error of string argument length.
1 Answer 1
Reading the Data (Short Answer)
After you have determined the size of the grid (n_rowsxn_cols = 448x304) from your header (see below), you can simply read the data using numpy.frombuffer.
import numpy as np
#...
#Get data from Numpy buffer
dt = np.dtype(('>u1', (n_rows, n_cols)))
x = np.frombuffer(filecontent[300:], dt) #we know the data starts from idx 300 onwards
#Remove unnecessary dimension that numpy gave us
x = x[0,:,:]
The '>u1' specifies the format of the data, in this case unsigned integers of size 1-byte, that are big-endian format.
Plotting this with matplotlib.pyplot
import matplotlib.pyplot as plt
#...
plt.imshow(x, extent=[0,3,-3,3], aspect="auto")
plt.show()
The extent= option simply specifies the axis values, you can change these to lat/lon for example (parsed from your header)
Explanation of Error from .unpack()
From the docs for struct.unpack(fmt, string):
The string must contain exactly the amount of data required by the format (
len(string)must equalcalcsize(fmt))
You can determine the size specified in the format string (fmt) by looking at the Format Characters section.
Your fmt in struct.unpack("iiii",filecontent[:300]), specifies 4 int types (you can also use 4i = iiii for simplicity), each of which have size 4, requiring a string of length 16.
Your string (filecontent[:300]) is of length 300, whilst your fmt is asking for a string of length 16, hence the error.
Example Usage of .unpack()
As an example, reading your supplied document I extracted the first 21*6 bytes, which has format:
a 21-element array of 6-byte character strings that contain information such as polar stereographic grid characteristics
With:
x = struct.unpack("6s"*21, filecontent[:126])
This returns a tuple of 21 elements. Note the whitespace padding in some elements to meet the 6-byte requirement.
>> print x
# ('00255\x00', ' 304\x00', ' 448\x00', '1.799\x00', '39.43\x00', '45.00\x00', '558.4\x00', '154.0\x00', '234.0\x00', '
# SMMR\x00', '07 cn\x00', ' 336\x00', ' 0000\x00', ' 0034\x00', ' 364\x00', ' 0000\x00', ' 0046\x00', ' 1979\x00', ' 33
# 6\x00', ' 000\x00', '00250\x00')
Notes:
- The first argument
fmt,"6s"*21is a string with6srepeated 21 times. Each format-character6srepresents one string of 6-bytes (see below), this will match the required format specified in your document. - The number
126infilecontent[:126]is calculated as6*21 = 126. - Note that for the
s(string) specifier, the preceding number does not mean to repeat the format character 6 times (as it would normally for other format characters). Instead, it specifies the size of the string.srepresents a 1-byte string, whilst6srepresents a 6-byte string.
More Extensive Solution for Header Reading (Long)
Because the binary data must be manually specified, this may be tedious to do in source code. You can consider using some configuration file (like .ini file)
This function will read the header and store it in a dictionary, where the structure is given from a .ini file
# user configparser for Python 3x
import ConfigParser
def read_header(data, config_file):
"""
Read binary data specified by a INI file which specifies the structure
"""
with open(config_file) as fd:
#Init the config class
conf = ConfigParser.ConfigParser()
conf.readfp(fd)
#preallocate dictionary to store data
header = {}
#Iterate over the key-value pairs under the
#'Structure' section
for key in conf.options('structure'):
#determine the string properties
start_idx, end_idx = [int(x) for x in conf.get('structure', key).split(',')]
start_idx -= 1 #remember python is zero indexed!
strLength = end_idx - start_idx
#Get the data
header[key] = struct.unpack("%is" % strLength, data[start_idx:end_idx])
#Format the data
header[key] = [x.strip() for x in header[key]]
header[key] = [x.replace('\x00', '') for x in header[key]]
#Unmap from list-type
#use .items() for Python 3x
header = {k:v[0] for k, v in header.iteritems()}
return header
An example .ini file below. The key is the name to use when storing the data, and the values is a comma-separated pair of values, the first being the starting index and the second being the ending index. These values were taken from Table 1 in your document.
[structure]
missing_data: 1, 6
n_cols: 7, 12
n_rows: 13, 18
latitude_enclosed: 25, 30
This function can be used as follows:
header = read_header(filecontent, 'headerStructure.ini')
n_cols = int(header['n_cols'])
7 Comments
x = struct.unpack("75i", filecontent[:300])?int ('i') as opposed to an unsigned int ('u'), as the values range from 0 to 255. I could not mask the null values with Gray correctly, and if you look at the browse comparison image, you can see the differences