I wrote this code to read a square matrix saved as binary file, that it can be int, uint, float
etc... to pick a value from the matrix at given row y
and given column x
.
Could this code be faster to pick this value, better than 20 seconds.
The maximum row and column number in matrix is 3601 for each one.
import struct
#convert x,y indexes of 2D array into index i of 1D array
#x: index for cols in the 2D
#y: index for rows in the 2D
def to1D(y,x,width):
i = x + width*y
return i
def readBinary_as(filename,x,y,file_type,width=3601):
with open(filename,'rb') as file_data:
#initialize, how many bytes to be read
nbByte = 0
#data type of the file, uint, int, float, etc...
coding = ''
if file_type == "signed int":
nbByte = 2
coding = '>h' #2B Signed Int - BE
if file_type == "unsigned int":
nbByte = 2
coding = '>H' #2B Unsigned Int - BE
if file_type == "unsigned byte":
nbByte = 1
coding = '>B' #1B Unsigned Byte - BE
if file_type == "float":
nbByte = 4
coding = '>f' #4B float32 - BE
#index of my value in 1D array
i = to1D(y,x,width)
for each_cell in range(0,i):
file_data.read(nbByte)
#read and save the picked value
my_value_pos = file_data.read(nbByte)
val = struct.unpack(coding,my_value_pos)[0]
return val
2 Answers 2
Try using seek()
instead of a loop that reads 1 value at a time.
import io
import struct
#convert x,y indexes of 2D array into index i of 1D array
#x: index for cols in the 2D
#y: index for rows in the 2D
def to1D(y,x,width):
i = x + width*y
return i
def readBinary_as(filename,x,y,file_type,width=3601):
with open(filename,'rb') as file_data:
#initialize, how many bytes to be read
nbByte = 0
#data type of the file, uint, int, float, etc...
coding = ''
if file_type == "signed int":
nbByte = 2
coding = '>h' #2B Signed Int - BE
if file_type == "unsigned int":
nbByte = 2
coding = '>H' #2B Unsigned Int - BE
if file_type == "unsigned byte":
nbByte = 1
coding = '>B' #1B Unsigned Byte - BE
if file_type == "float":
nbByte = 4
coding = '>f' #4B float32 - BE
#index of my value in 1D array
i = to1D(y,x,width)
offset = i * nbByte
# seek to byte offset of desired data
file_data.seek(offset)
#read and save the picked value
my_value_pos = file_data.read(nbByte)
val = struct.unpack(coding,my_value_pos)[0]
return val
-
\$\begingroup\$ Woow! that is what I searched for, thank you, I will read more about this seek(), I liked it. Thank you for your answer. But I passed oddset as parameter to seek() instead offset, I think you wrote it wrong without intention \$\endgroup\$Khaled– Khaled2019年11月17日 22:21:52 +00:00Commented Nov 17, 2019 at 22:21
-
\$\begingroup\$ Corrected the typo. \$\endgroup\$RootTwo– RootTwo2019年11月17日 23:08:56 +00:00Commented Nov 17, 2019 at 23:08
Python has an official style-guide, PEP8, which reommends using lower_case
for functions and variables.
There is also a standard for documenting functions, called docstring
convention, which is codified in PEP257.
In addition, you could use elif
to avoid checking all if
conditions if you have already found your file type. Or even better, put them all into a dictionary. Note that this now raises a KeyError
for an undefined file type (which is a good thing). If you don't want that, use FILE_TYPES.get(file_type, (0, ''))
instead.
I renamed your to1D
function to ravel
, because that is what this operation is called in numpy
.
import struct
FILE_TYPES = {"signed int": (2, '>h'),
"unsigned int": (2, '>H'),
"unsigned byte": (1, '>B'),
"float": (4, '>f')})
def ravel(x, y, width):
"""Convert `x`, `y` indexes of 2D array into index i of a 1D array.
Assumes that the array is of one consistent `width`.
x: index for cols in the 2D
y: index for rows in the 2D
"""
return x + width * y
def read_binary_as(file_name, x, y, file_type, width=3601):
"""Read the value at position `x`, `y` from array in `file_name`.
Assumes that all values are of the same `file_type`
and that each row has the same `width`.
"""
size, coding = FILE_TYPES[file_type]
offset = ravel(x, y, width) * size
with open(file_name, 'b') as file:
file.seek(offset)
return struct.unpack(coding, file.read(size))[0]
int
andfloat
numbers with shape10 x 10
. If I want to get item at location row=5
, col=4
(24th item), why should it read54
bytes, as that would be returned byto1D(5, 4, 10)
call ??? \$\endgroup\$