Speed performance of picking unique value from binary matrix file

Question 1

I wrote this code to read a square matrix saved as binary file, that it can be int, uint, float etc... to pick a value from the matrix at given row y and given column x. Could this code be faster to pick this value, better than 20 seconds. The maximum row and column number in matrix is 3601 for each one.

import struct
#convert x,y indexes of 2D array into index i of 1D array
#x: index for cols in the 2D
#y: index for rows in the 2D
def to1D(y,x,width):
 i = x + width*y
 return i
def readBinary_as(filename,x,y,file_type,width=3601):
 with open(filename,'rb') as file_data:
 #initialize, how many bytes to be read
 nbByte = 0
 #data type of the file, uint, int, float, etc...
 coding = ''
 if file_type == "signed int":
 nbByte = 2
 coding = '>h' #2B Signed Int - BE
 if file_type == "unsigned int":
 nbByte = 2
 coding = '>H' #2B Unsigned Int - BE
 if file_type == "unsigned byte":
 nbByte = 1
 coding = '>B' #1B Unsigned Byte - BE
 if file_type == "float":
 nbByte = 4
 coding = '>f' #4B float32 - BE
 #index of my value in 1D array
 i = to1D(y,x,width)
 for each_cell in range(0,i):
 file_data.read(nbByte)
 #read and save the picked value
 my_value_pos = file_data.read(nbByte)
 val = struct.unpack(coding,my_value_pos)[0]
 return val

Question 2

The title states "picking unique value" - where's uniqueness handled in your implementation?

Question 3

I mean, I don't want to read the whole file, just one value, at given indexes assuming the file as 2D array (because it is matrix data).

Question 4

I have a doubt that this logic would work correctly. Let's say we have a matrix of mixed int and float numbers with shape 10 x 10. If I want to get item at location row=5, col=4 (24th item), why should it read 54 bytes, as that would be returned by to1D(5, 4, 10) call ???

Question 5

The matrix is all int or float etc... no mixing at all, I am still beginner to such things.

Question 6

Sorry, you have right, but this code not for general case, if you know nasadem files for DEM, I think, it works like I did and tried. I dont want to read the same file as int then as float, no! there is file as float, other as int, etc... so stand in documentation with each file, my code recognize the corresponding type to be read with the extension of file, I switch that with my if statements

Question 7

Try using seek() instead of a loop that reads 1 value at a time.

import io
import struct
#convert x,y indexes of 2D array into index i of 1D array
#x: index for cols in the 2D
#y: index for rows in the 2D
def to1D(y,x,width):
 i = x + width*y
 return i
def readBinary_as(filename,x,y,file_type,width=3601):
 with open(filename,'rb') as file_data:
 #initialize, how many bytes to be read
 nbByte = 0
 #data type of the file, uint, int, float, etc...
 coding = ''
 if file_type == "signed int":
 nbByte = 2
 coding = '>h' #2B Signed Int - BE
 if file_type == "unsigned int":
 nbByte = 2
 coding = '>H' #2B Unsigned Int - BE
 if file_type == "unsigned byte":
 nbByte = 1
 coding = '>B' #1B Unsigned Byte - BE
 if file_type == "float":
 nbByte = 4
 coding = '>f' #4B float32 - BE
 #index of my value in 1D array
 i = to1D(y,x,width)
 offset = i * nbByte
 # seek to byte offset of desired data
 file_data.seek(offset)
 #read and save the picked value
 my_value_pos = file_data.read(nbByte)
 val = struct.unpack(coding,my_value_pos)[0]
 return val

Question 8

Woow! that is what I searched for, thank you, I will read more about this seek(), I liked it. Thank you for your answer. But I passed oddset as parameter to seek() instead offset, I think you wrote it wrong without intention

Question 9

Corrected the typo.

Question 10

Python has an official style-guide, PEP8, which reommends using lower_case for functions and variables.

There is also a standard for documenting functions, called docstring convention, which is codified in PEP257.

In addition, you could use elif to avoid checking all if conditions if you have already found your file type. Or even better, put them all into a dictionary. Note that this now raises a KeyError for an undefined file type (which is a good thing). If you don't want that, use FILE_TYPES.get(file_type, (0, '')) instead.

I renamed your to1D function to ravel, because that is what this operation is called in numpy.

import struct
FILE_TYPES = {"signed int": (2, '>h'),
 "unsigned int": (2, '>H'),
 "unsigned byte": (1, '>B'),
 "float": (4, '>f')})
def ravel(x, y, width):
 """Convert `x`, `y` indexes of 2D array into index i of a 1D array.
 Assumes that the array is of one consistent `width`.
 x: index for cols in the 2D
 y: index for rows in the 2D
 """
 return x + width * y
def read_binary_as(file_name, x, y, file_type, width=3601):
 """Read the value at position `x`, `y` from array in `file_name`.
 Assumes that all values are of the same `file_type`
 and that each row has the same `width`.
 """
 size, coding = FILE_TYPES[file_type]
 offset = ravel(x, y, width) * size
 with open(file_name, 'b') as file:
 file.seek(offset)
 return struct.unpack(coding, file.read(size))[0]

RootTwo RootTwo 10.6k1 gold badge14 silver badges30 bronze badges · Accepted Answer · 2019-11-17 20:40:16Z

Try using seek() instead of a loop that reads 1 value at a time.

import io
import struct
#convert x,y indexes of 2D array into index i of 1D array
#x: index for cols in the 2D
#y: index for rows in the 2D
def to1D(y,x,width):
 i = x + width*y
 return i
def readBinary_as(filename,x,y,file_type,width=3601):
 with open(filename,'rb') as file_data:
 #initialize, how many bytes to be read
 nbByte = 0
 #data type of the file, uint, int, float, etc...
 coding = ''
 if file_type == "signed int":
 nbByte = 2
 coding = '>h' #2B Signed Int - BE
 if file_type == "unsigned int":
 nbByte = 2
 coding = '>H' #2B Unsigned Int - BE
 if file_type == "unsigned byte":
 nbByte = 1
 coding = '>B' #1B Unsigned Byte - BE
 if file_type == "float":
 nbByte = 4
 coding = '>f' #4B float32 - BE
 #index of my value in 1D array
 i = to1D(y,x,width)
 offset = i * nbByte
 # seek to byte offset of desired data
 file_data.seek(offset)
 #read and save the picked value
 my_value_pos = file_data.read(nbByte)
 val = struct.unpack(coding,my_value_pos)[0]
 return val

Woow! that is what I searched for, thank you, I will read more about this seek(), I liked it. Thank you for your answer. But I passed oddset as parameter to seek() instead offset, I think you wrote it wrong without intention

Stack Exchange Network

Speed performance of picking unique value from binary matrix file

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Speed performance of picking unique value from binary matrix file

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions