2
\$\begingroup\$

I wrote this code to read a square matrix saved as binary file, that it can be int, uint, float etc... to pick a value from the matrix at given row y and given column x. Could this code be faster to pick this value, better than 20 seconds. The maximum row and column number in matrix is 3601 for each one.

import struct
#convert x,y indexes of 2D array into index i of 1D array
#x: index for cols in the 2D
#y: index for rows in the 2D
def to1D(y,x,width):
 i = x + width*y
 return i
def readBinary_as(filename,x,y,file_type,width=3601):
 with open(filename,'rb') as file_data:
 #initialize, how many bytes to be read
 nbByte = 0
 #data type of the file, uint, int, float, etc...
 coding = ''
 if file_type == "signed int":
 nbByte = 2
 coding = '>h' #2B Signed Int - BE
 if file_type == "unsigned int":
 nbByte = 2
 coding = '>H' #2B Unsigned Int - BE
 if file_type == "unsigned byte":
 nbByte = 1
 coding = '>B' #1B Unsigned Byte - BE
 if file_type == "float":
 nbByte = 4
 coding = '>f' #4B float32 - BE
 #index of my value in 1D array
 i = to1D(y,x,width)
 for each_cell in range(0,i):
 file_data.read(nbByte)
 #read and save the picked value
 my_value_pos = file_data.read(nbByte)
 val = struct.unpack(coding,my_value_pos)[0]
 return val
AlexV
7,3532 gold badges24 silver badges47 bronze badges
asked Nov 17, 2019 at 0:16
\$\endgroup\$
6
  • \$\begingroup\$ The title states "picking unique value" - where's uniqueness handled in your implementation? \$\endgroup\$ Commented Nov 17, 2019 at 6:37
  • \$\begingroup\$ I mean, I don't want to read the whole file, just one value, at given indexes assuming the file as 2D array (because it is matrix data). \$\endgroup\$ Commented Nov 17, 2019 at 6:44
  • \$\begingroup\$ I have a doubt that this logic would work correctly. Let's say we have a matrix of mixed int and float numbers with shape 10 x 10. If I want to get item at location row=5, col=4 (24th item), why should it read 54 bytes, as that would be returned by to1D(5, 4, 10) call ??? \$\endgroup\$ Commented Nov 17, 2019 at 19:17
  • \$\begingroup\$ The matrix is all int or float etc... no mixing at all, I am still beginner to such things. \$\endgroup\$ Commented Nov 17, 2019 at 19:21
  • \$\begingroup\$ Sorry, you have right, but this code not for general case, if you know nasadem files for DEM, I think, it works like I did and tried. I dont want to read the same file as int then as float, no! there is file as float, other as int, etc... so stand in documentation with each file, my code recognize the corresponding type to be read with the extension of file, I switch that with my if statements \$\endgroup\$ Commented Nov 17, 2019 at 19:45

2 Answers 2

3
\$\begingroup\$

Try using seek() instead of a loop that reads 1 value at a time.

import io
import struct
#convert x,y indexes of 2D array into index i of 1D array
#x: index for cols in the 2D
#y: index for rows in the 2D
def to1D(y,x,width):
 i = x + width*y
 return i
def readBinary_as(filename,x,y,file_type,width=3601):
 with open(filename,'rb') as file_data:
 #initialize, how many bytes to be read
 nbByte = 0
 #data type of the file, uint, int, float, etc...
 coding = ''
 if file_type == "signed int":
 nbByte = 2
 coding = '>h' #2B Signed Int - BE
 if file_type == "unsigned int":
 nbByte = 2
 coding = '>H' #2B Unsigned Int - BE
 if file_type == "unsigned byte":
 nbByte = 1
 coding = '>B' #1B Unsigned Byte - BE
 if file_type == "float":
 nbByte = 4
 coding = '>f' #4B float32 - BE
 #index of my value in 1D array
 i = to1D(y,x,width)
 offset = i * nbByte
 # seek to byte offset of desired data
 file_data.seek(offset)
 #read and save the picked value
 my_value_pos = file_data.read(nbByte)
 val = struct.unpack(coding,my_value_pos)[0]
 return val
answered Nov 17, 2019 at 20:40
\$\endgroup\$
2
  • \$\begingroup\$ Woow! that is what I searched for, thank you, I will read more about this seek(), I liked it. Thank you for your answer. But I passed oddset as parameter to seek() instead offset, I think you wrote it wrong without intention \$\endgroup\$ Commented Nov 17, 2019 at 22:21
  • \$\begingroup\$ Corrected the typo. \$\endgroup\$ Commented Nov 17, 2019 at 23:08
2
\$\begingroup\$

Python has an official style-guide, PEP8, which reommends using lower_case for functions and variables.

There is also a standard for documenting functions, called docstring convention, which is codified in PEP257.

In addition, you could use elif to avoid checking all if conditions if you have already found your file type. Or even better, put them all into a dictionary. Note that this now raises a KeyError for an undefined file type (which is a good thing). If you don't want that, use FILE_TYPES.get(file_type, (0, '')) instead.

I renamed your to1D function to ravel, because that is what this operation is called in numpy.

import struct
FILE_TYPES = {"signed int": (2, '>h'),
 "unsigned int": (2, '>H'),
 "unsigned byte": (1, '>B'),
 "float": (4, '>f')})
def ravel(x, y, width):
 """Convert `x`, `y` indexes of 2D array into index i of a 1D array.
 Assumes that the array is of one consistent `width`.
 x: index for cols in the 2D
 y: index for rows in the 2D
 """
 return x + width * y
def read_binary_as(file_name, x, y, file_type, width=3601):
 """Read the value at position `x`, `y` from array in `file_name`.
 Assumes that all values are of the same `file_type`
 and that each row has the same `width`.
 """
 size, coding = FILE_TYPES[file_type]
 offset = ravel(x, y, width) * size
 with open(file_name, 'b') as file:
 file.seek(offset)
 return struct.unpack(coding, file.read(size))[0]
answered Nov 18, 2019 at 9:52
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.