I have a binary file input.hgt
from the Shuttle Radar Topography Mission. It contains a 3601✕3601 matrix, stored as 2-Byte big-endian integer numbers. My code converts it to a readable text file output.csv
.
How can I make it better?
Here is my code:
#include <iostream>
#include <fstream>
using namespace std;
const int SRTM_SIZE = 3601;
static int height[SRTM_SIZE][SRTM_SIZE] = {{0},{0}};
int main() {
ifstream file("/storage/emulated/0/input.hgt", ios::in | ios::binary);
if(!file) {
cout << "Error opening file!" << endl;
return -1;
}
unsigned char buffer[2];
for (int i = 0; i < SRTM_SIZE; ++i) {
for (int j = 0; j < SRTM_SIZE; ++j) {
if(!file.read(reinterpret_cast<char*>(buffer), sizeof(buffer) )) {
cout << "Error reading file!" << endl;
return -1; }
height[i][j] = (buffer[0] << 8) | buffer[1];
} }
ofstream meinFile;
meinFile.open ("/storage/emulated/0/output.csv");
for(int x = 0; x < SRTM_SIZE; x++)
{ // from row 1 to row SRTM size
for(int y = 0; y < SRTM_SIZE; y++)
{// from column 1 to SRTM_Size
meinFile << height[x][y] << ",";
}
meinFile << endl;
}
meinFile.close();
cout<< "Gratulations!" <<endl;
cout<< "Your file has been converted sucessfully" <<endl;
return 0;
}
-
\$\begingroup\$ Can close-voters point out what is broken with the code? looks fine at a glance, so leaving open \$\endgroup\$Dan Oberlam– Dan Oberlam2018年07月18日 21:28:15 +00:00Commented Jul 18, 2018 at 21:28
-
1\$\begingroup\$ @Dannnno the feature request was edited out. \$\endgroup\$Summer– Summer2018年07月18日 22:58:10 +00:00Commented Jul 18, 2018 at 22:58
-
\$\begingroup\$ Is the source formatted as it appears here? \$\endgroup\$vnp– vnp2018年07月18日 23:48:05 +00:00Commented Jul 18, 2018 at 23:48
3 Answers 3
Avoid
using namespace std
.Assuming
sizeof(int)
is 4, theheight
array takes more than 40 MB of space. I understand that by modern standards it is a small change, but still a good citizen should not request more than is really needed.I recommend a stream-like processing: read 2 bytes, convert them, and print out. This way you only need 2 bytes of storage.
The program terminates each row with a dangling comma. I don't know what the RFC says about it. Consider however
for(int x = 0; x < SRTM_SIZE; x++) { for(int y = 1; y < SRTM_SIZE - 1; y++) { meinFile << height[x][y] << ","; } meinFile << height[x][SRTM_SIZE - 1] << std::endl; }
You test reading errors. Writing may also fail.
Error detection is important. Error reporting is equally important.
Can't open file - why? Is it insufficient permission? file not found? corrupted media?
Can't read file - why? Can't write file - why?
Check out
strerror
,perror
or similar facilities.
Along with the answer given by @vnp:
I'll start off by saying that the indentation makes it quite hard to read, so I would use a script/macro provided by an IDE or a program that is capable of providing correct indentation.
Use constexpr
over const
whenever possible
Anything declared as a constexpr
may be evaluated at compile-time, thus increasing performance during runtime.
Prefer std::array<T, SIZE>
over c-style arrays
std::array
(in header <array>
) provides bounds checking and some other useful functions.
Avoid reinterpret_cast
This is platform dependent, and can cause a lot of headache. In fact, if you have to use reinterpret_cast
, you might want to reanalyze your code and see if there is a better solution.
In your case, I don't exactly know why you declared an unsigned char
array, because you're transferring the bytes into an int
anyway.
Note on byte arrangement (Endianness)
Right now, you're reading the data from the file as if it were big endian
, meaning the most significant byte comes first. Now, there is nothing wrong with that, but in the future if the file specification changes, you might want to have a parameter, or an argument, that specifies how the bytes are arranged in the file.
In general, when working with binary files, it is a good thing to keep that in mind.
Use std::cerr
instead of std::cout
for error messages
While on your system the streams may point to the same destination, these may be rerouted by the user. Generally, you want to separate your user interface and error/logging output. This may become more important in multi-threaded console applications.
Final thoughts on your approach:
If the end goal is to simply convert a binary file into a csv, why use so much memory to store the entire binary file? An input stream and output stream can exist at the same time as long as they don't point to the same path. This means, you can write to the csv file as you're reading the binary file. Here's how it may look like:
...
char buffer[2];
binary_file.read(buffer, sizeof(buffer));
csv_outut << ((buffer[0] << 8) | buffer[1]) << ',';
....
Avoid hard-coding the file names.
Instead of embedding the file names in the code, you could allow the user to specify the input and output files. Even better, the program doesn't need to know the names of the files; it can simply act as a pipe. If you read from standard input and write to standard output, then you don't need <fstream>
and you can omit the code to open the files and handle errors.
Obviously, make sure you use std::cerr
for your error messages, not std::cout
.
Additionally, by acting as a filter, a failure to open the output stream is reported before you do any of the processing. If you don't act as a filter, it's a good idea to open the output file before you read and process the input, for this reason.
Return a positive error code
This is more cosmetic, but prefer small positive numbers to indicate errors. Most standard commands exit with status 1
on failure; some use 2
, 3
, etc. if they need to indicate different kinds of failure. Many platforms truncate the exit status - on my current system, your -1
appears in the shell as 255
. Using small positive numbers will maintain consistency between what C sees and what your shell sees.
If it helps, <cstdlib>
provides convenient macros EXIT_SUCCESS
and EXIT_FAILURE
for this purpose.