5
\$\begingroup\$

I have binary files that need to be efficiently processed. The first 8 bytes correspond to metadata, and all the rest is data. From the first 8 bytes I need the last 4 bytes to determine how to structure the rest of the data.

Since I'm new to rust, this seemed like a good exercise. The following code complies and produces results that seeem reasonable.

use std::convert::TryInto;
use ndarray::Array2;
use chrono::prelude::*;
fn four_bytes_to_array(barry: &[u8]) -> &[u8; 4] {
 barry.try_into().expect("slice with incorrect length")
}
fn eight_bytes_to_array(barry: &[u8]) -> &[u8; 8] {
 barry.try_into().expect("slice with incorrect length")
}
fn bin_file_to_matrix(file_name: &str) -> ndarray::Array2<f64> {
 // Read in file_content
 let mut file_content = std::fs::read(file_name).expect("Could not read file!");
 
 // The first 4 bytes are some random information, the second 4 bytes are the
 // number of data-points per spectrum
 let nr_dp_per_spectrum = four_bytes_to_array(&file_content[4..8]);
 // We combine the 4 bytes into an unsigned integer
 let nr_dp_per_spectrum = u32::from_be_bytes(*nr_dp_per_spectrum);
 
 // Calculate how many spectra there are in this file
 let how_many_spectra = file_content.len() as u32/8/(nr_dp_per_spectrum + 1u32);
 // Create a buffer to write the data to
 let dim = ndarray::Dim([how_many_spectra as usize, nr_dp_per_spectrum as usize]);
 let mut data = Array2::<f64>::zeros(dim);
 // Remove the first 8 bytes that contain metadata we have already processed
 file_content.drain(0..8);
 for i in 0..how_many_spectra {
 for j in 0..nr_dp_per_spectrum {
 let idx = ( (nr_dp_per_spectrum+1) * i + j * 8 ) as usize;
 let tmp = eight_bytes_to_array( &file_content[idx..idx+8] );
 let val = f64::from_be_bytes( *tmp );
 data[ndarray::Ix2(i as usize, j as usize)] = val;
 }
 }
 data
}
fn main() {
 let start = Utc::now();
 let res = bin_file_to_matrix("./data/example.bin");
 let difference = Utc::now() - start;
 println!("Time:\t {:?}", difference);
}

Is there a way to speed up the code?

asked Dec 23, 2021 at 14:06
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

Is there a way to speed up the code?

The speed is determined only be the nested for loop. Nothing glaring stands out to me. Are you compiling in debug mode (without most optimizations) or in release mode (with optimizations on)?

I cleaned up your code somewhat, though I don't expect it to have any performance impact, mostly reducing necessity of casts and adding a generic wrapper around from_be_bytes which automatically advances the reference to the bytes to be read by the correct amount. This also obviates the need for calculating the index in the loop, which may have a slight performance impact, though similar work is instead done by read_be, so the impact is probably small...

use chrono::prelude::*;
use ndarray::Array2;
use std::convert::TryInto;
trait EndianRead {
 fn read_be(input: &mut &[u8]) -> Self;
}
macro_rules! impl_EndianRead_for_nums (( $($num:ident),* ) => {
 $(
 impl EndianRead for $num {
 fn read_be(input: &mut &[u8]) -> Self {
 let (bytes, rest) = input.split_at(std::mem::size_of::<Self>());
 *input = rest;
 Self::from_be_bytes(bytes.try_into().unwrap())
 }
 }
 )*
});
impl_EndianRead_for_nums!(u32, f64);
fn bin_file_to_matrix(file_name: &str) -> ndarray::Array2<f64> {
 // Read in file_content
 let file_content = std::fs::read(file_name).expect("Could not read file!");
 // The first 4 bytes are some random information
 let mut byte_content = &file_content[4..];
 
 // the second 4 bytes are the number of data-points per spectrum
 // We combine the 4 bytes into an unsigned integer
 let nr_dp_per_spectrum = <u32 as EndianRead>::read_be(&mut byte_content) as usize;
 // Calculate how many spectra there are in this file
 let spectrum_size = std::mem::size_of::<f64>() * nr_dp_per_spectrum;
 let how_many_spectra = byte_content.len() / spectrum_size;
 // Create a buffer to write the data to
 let dim = ndarray::Dim([how_many_spectra, nr_dp_per_spectrum]);
 let mut data = Array2::<f64>::zeros(dim);
 for i in 0..how_many_spectra {
 for j in 0..nr_dp_per_spectrum {
 let val = <f64 as EndianRead>::read_be(&mut byte_content);
 data[ndarray::Ix2(i as usize, j as usize)] = val;
 }
 }
 data
}
fn main() {
 let start = Utc::now();
 let _res = bin_file_to_matrix("./data/example.bin");
 let difference = Utc::now() - start;
 println!("Time:\t {:?}", difference);
}
answered Dec 26, 2021 at 10:05
\$\endgroup\$
2
  • 2
    \$\begingroup\$ I cleaned up your code ... I suspect there are more observations you could make about the code. Could you answer in the question what about the code made you clean it up. \$\endgroup\$ Commented Dec 26, 2021 at 14:28
  • 1
    \$\begingroup\$ @pacmaninbw, thanks for your suggestion. It was mostly my impression of too much casting going on, so I'll add that in my answer. \$\endgroup\$ Commented Dec 26, 2021 at 15:35

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.