Convert binary raster file to text CSV

Question 1

I have a binary file input.hgt from the Shuttle Radar Topography Mission. It contains a 3601✕3601 matrix, stored as 2-Byte big-endian integer numbers. My code converts it to a readable text file output.csv.

How can I make it better?

Here is my code:

#include <iostream>
#include <fstream>
using namespace std;
const int SRTM_SIZE = 3601;
static int height[SRTM_SIZE][SRTM_SIZE] = {{0},{0}};
int main() {
ifstream file("/storage/emulated/0/input.hgt", ios::in | ios::binary);
if(!file) {
cout << "Error opening file!" << endl;
return -1;
}
unsigned char buffer[2];
for (int i = 0; i < SRTM_SIZE; ++i) {
for (int j = 0; j < SRTM_SIZE; ++j) {
if(!file.read(reinterpret_cast<char*>(buffer), sizeof(buffer) )) {
 cout << "Error reading file!" << endl;
return -1; }
height[i][j] = (buffer[0] << 8) | buffer[1]; 
} }
ofstream meinFile;
meinFile.open ("/storage/emulated/0/output.csv");
 for(int x = 0; x < SRTM_SIZE; x++)
 { // from row 1 to row SRTM size
 for(int y = 0; y < SRTM_SIZE; y++)
 {// from column 1 to SRTM_Size
 meinFile << height[x][y] << ",";
 }
 meinFile << endl;
 }
 meinFile.close();
 cout<< "Gratulations!" <<endl;
 cout<< "Your file has been converted sucessfully" <<endl;
 return 0;
}

Question 2

Can close-voters point out what is broken with the code? looks fine at a glance, so leaving open

Question 3

@Dannnno the feature request was edited out.

Question 4

Is the source formatted as it appears here?

Question 5

Avoid using namespace std.
Assuming sizeof(int) is 4, the height array takes more than 40 MB of space. I understand that by modern standards it is a small change, but still a good citizen should not request more than is really needed.

I recommend a stream-like processing: read 2 bytes, convert them, and print out. This way you only need 2 bytes of storage.

The program terminates each row with a dangling comma. I don't know what the RFC says about it. Consider however

 for(int x = 0; x < SRTM_SIZE; x++) {
 for(int y = 1; y < SRTM_SIZE - 1; y++) {
 meinFile << height[x][y] << ",";
 }
 meinFile << height[x][SRTM_SIZE - 1] << std::endl;
 }

You test reading errors. Writing may also fail.
Error detection is important. Error reporting is equally important.

Can't open file - why? Is it insufficient permission? file not found? corrupted media?

Can't read file - why? Can't write file - why?

Check out strerror, perror or similar facilities.

Question 6

Along with the answer given by @vnp:

I'll start off by saying that the indentation makes it quite hard to read, so I would use a script/macro provided by an IDE or a program that is capable of providing correct indentation.

Use `constexpr` over `const` whenever possible

Anything declared as a constexpr may be evaluated at compile-time, thus increasing performance during runtime.

Prefer `std::array<T, SIZE>` over c-style arrays

std::array (in header <array>) provides bounds checking and some other useful functions.

Avoid `reinterpret_cast`

This is platform dependent, and can cause a lot of headache. In fact, if you have to use reinterpret_cast, you might want to reanalyze your code and see if there is a better solution.

In your case, I don't exactly know why you declared an unsigned char array, because you're transferring the bytes into an int anyway.

Note on byte arrangement (Endianness)

Right now, you're reading the data from the file as if it were big endian, meaning the most significant byte comes first. Now, there is nothing wrong with that, but in the future if the file specification changes, you might want to have a parameter, or an argument, that specifies how the bytes are arranged in the file.

In general, when working with binary files, it is a good thing to keep that in mind.

Use `std::cerr` instead of `std::cout` for error messages

While on your system the streams may point to the same destination, these may be rerouted by the user. Generally, you want to separate your user interface and error/logging output. This may become more important in multi-threaded console applications.

Final thoughts on your approach:

If the end goal is to simply convert a binary file into a csv, why use so much memory to store the entire binary file? An input stream and output stream can exist at the same time as long as they don't point to the same path. This means, you can write to the csv file as you're reading the binary file. Here's how it may look like:

...
char buffer[2];
binary_file.read(buffer, sizeof(buffer));
csv_outut << ((buffer[0] << 8) | buffer[1]) << ',';
....

Question 7

Avoid hard-coding the file names.

Instead of embedding the file names in the code, you could allow the user to specify the input and output files. Even better, the program doesn't need to know the names of the files; it can simply act as a pipe. If you read from standard input and write to standard output, then you don't need <fstream> and you can omit the code to open the files and handle errors.

Obviously, make sure you use std::cerr for your error messages, not std::cout.

Additionally, by acting as a filter, a failure to open the output stream is reported before you do any of the processing. If you don't act as a filter, it's a good idea to open the output file before you read and process the input, for this reason.

Return a positive error code

This is more cosmetic, but prefer small positive numbers to indicate errors. Most standard commands exit with status 1 on failure; some use 2, 3, etc. if they need to indicate different kinds of failure. Many platforms truncate the exit status - on my current system, your -1 appears in the shell as 255. Using small positive numbers will maintain consistency between what C sees and what your shell sees.

If it helps, <cstdlib> provides convenient macros EXIT_SUCCESS and EXIT_FAILURE for this purpose.

vnp vnp 58.6k4 gold badges55 silver badges144 bronze badges · Accepted Answer · 2018-07-19 00:16:58Z

Avoid using namespace std.
Assuming sizeof(int) is 4, the height array takes more than 40 MB of space. I understand that by modern standards it is a small change, but still a good citizen should not request more than is really needed.

I recommend a stream-like processing: read 2 bytes, convert them, and print out. This way you only need 2 bytes of storage.

The program terminates each row with a dangling comma. I don't know what the RFC says about it. Consider however

 for(int x = 0; x < SRTM_SIZE; x++) {
 for(int y = 1; y < SRTM_SIZE - 1; y++) {
 meinFile << height[x][y] << ",";
 }
 meinFile << height[x][SRTM_SIZE - 1] << std::endl;
 }

You test reading errors. Writing may also fail.
Error detection is important. Error reporting is equally important.

Can't open file - why? Is it insufficient permission? file not found? corrupted media?

Can't read file - why? Can't write file - why?

Check out strerror, perror or similar facilities.

Stack Exchange Network

Convert binary raster file to text CSV

3 Answers 3

Use `constexpr` over `const` whenever possible

Prefer `std::array<T, SIZE>` over c-style arrays

Avoid `reinterpret_cast`

Note on byte arrangement (Endianness)

Use `std::cerr` instead of `std::cout` for error messages

Final thoughts on your approach:

Avoid hard-coding the file names.

Return a positive error code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Convert binary raster file to text CSV

3 Answers 3

Use constexpr over const whenever possible

Prefer std::array<T, SIZE> over c-style arrays

Avoid reinterpret_cast

Note on byte arrangement (Endianness)

Use std::cerr instead of std::cout for error messages

Final thoughts on your approach:

Avoid hard-coding the file names.

Return a positive error code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Use `constexpr` over `const` whenever possible

Prefer `std::array<T, SIZE>` over c-style arrays

Avoid `reinterpret_cast`

Use `std::cerr` instead of `std::cout` for error messages