Python - read 2d array from binary data

Question 1

I try to read a 2d array with floats from a binary file with Python. Files have been written with big endian by a Fortran program (it is the intermediate file of the Weather Research and Forecast model). I already know dimension sizes of the array to read (nx & ny) but as a Fortran and IDl programer I am completely lost, how to manage it in Python. (Later on I want to visualize the array).

Shall I use struct.unpack or numpy.fromfile or the array module?
Do I have to read first a vector and afterwards reshape it? (have seen this option only for the numpy-way)
How do I define a 2d array with numpy and how do I define the dtype to read with big-endian byte ordering?
Is there an issue with array ordering (column or row wise) to take into account?

Question 2

Do you know if Fortran saves some meta-information (i.e. the dimension) along with the values? Or is this just a constant "stream" of nx*ny big-endian numbers?

Question 3

You could try scipy.io.FortranFile (docs.scipy.org/doc/scipy/reference/generated/…). Note the comments in the docstring; whether or not FortranFile will work depends on the Fortran compiler.

Question 4

Doesn't WRF create files in NetCDF format? (Perhaps being an "intermediate file" is the issue.)

Question 5

In the Fortran file there are various meta-information before the data field. I could read this information and so know the dimensions nx and ny. But I am still not able to read the array due to the problems mentioned above.

Question 6

Yes, WRF works with Grib files as input and NetCDF files as output. But there are also files with a WRF specific binary format, which are a kind of gate when using NetCDF files as input.

Question 7

Short answers per sub-question:

I don't think the array module has a way to specify endianness. Between the struct module and Numpy I think Numpy is easier to use, especially for Fortran-like ordered arrays.
All data is inherently 1-dimensional as far as the hardware (disk, RAM, etc) is concerned, so yes reshaping to get a 2D representation is always necessary. With numpy.fromfile the reshape must happen explicitly afterwards, but numpy.memmap provides a way to reshape more implicitly.
The easiest way to specify endianness with Numpy is to use a short type string, actually very similar to the approach needed for the struct module. In Numpy >f and >f4 specify single precision and >d and >f8 double precision big-endian floating point.
Your binary file could walk the array along the rows (C-like) or along the columns (Fortran-like). Whichever of the two, this has to be taken into account to represent the data properly. Numpy makes this easy with the order keyword argument for reshape and memmap (among others).

All in all, the code could be for example:

import numpy as np
filename = 'somethingsomething'
with open(filename, 'rb') as f:
 nx, ny = ... # parse; advance file-pointer to data segment
 data = np.fromfile(f, dtype='>f8', count=nx*ny)
 array = np.reshape(data, [nx, ny], order='F')

Question 8

Many thanks, moarningsun ! Your explanations helped me to solve the problem !

user2379410 · Accepted Answer · 2015-09-21 14:23:41Z

Short answers per sub-question:

I don't think the array module has a way to specify endianness. Between the struct module and Numpy I think Numpy is easier to use, especially for Fortran-like ordered arrays.
All data is inherently 1-dimensional as far as the hardware (disk, RAM, etc) is concerned, so yes reshaping to get a 2D representation is always necessary. With numpy.fromfile the reshape must happen explicitly afterwards, but numpy.memmap provides a way to reshape more implicitly.
The easiest way to specify endianness with Numpy is to use a short type string, actually very similar to the approach needed for the struct module. In Numpy >f and >f4 specify single precision and >d and >f8 double precision big-endian floating point.
Your binary file could walk the array along the rows (C-like) or along the columns (Fortran-like). Whichever of the two, this has to be taken into account to represent the data properly. Numpy makes this easy with the order keyword argument for reshape and memmap (among others).

All in all, the code could be for example:

import numpy as np
filename = 'somethingsomething'
with open(filename, 'rb') as f:
 nx, ny = ... # parse; advance file-pointer to data segment
 data = np.fromfile(f, dtype='>f8', count=nx*ny)
 array = np.reshape(data, [nx, ny], order='F')

Many thanks, moarningsun ! Your explanations helped me to solve the problem !

CollectivesTM on Stack Overflow

Python - read 2d array from binary data

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related