I have a lot of data in separate files generated from a simulation program. All data is clear and formatted via a single space to read in 4 columns. I can open and store data from each file one at a time, but that will get so impossible once I hit more than, say, 10 files of data with the same simulation data columns.
I am just trying to find a better way to open files and store the data using an iterative method instead of typing, for example, "open file, read file data, append to file" over 1000 times.
#include <iostream>
#include <sstream>
#include <fstream>
#include <vector>
using namespace std;
int main ()
{
// INITIALIZE
int tmax = 80000;
float time_step[tmax];
float LEx[tmax];
float LEy[tmax];
float LEz[tmax];
float LE[tmax];
float AvgLEx;
float AvgLEy;
float AvgLEz;
float AvgLE;
float Function[tmax];
// FILES
ifstream infile;
infile.open("File_1.prn");
// THIS INFILE is where I want to put a loop to change File_1 to File_2 to File_3 and so on and so on to append data from all those files into ONE single file
if(infile.fail())
{
cout << "error reading file" << endl;
return 1;
}
for (int i = 0; i < tmax+1; ++i)
{
infile >> time_step[i] >> LEx[i] >> LEy[i] >> LEz[i] >> LE[i];
AvgLEx = AvgLEx + LEx[i];
AvgLEy = AvgLEy + LEy[i];
AvgLEz = AvgLEz + LEz[i];
AvgLE = AvgLE + LE[i];
}
AvgLEx = AvgLEx/tmax;
AvgLEy = AvgLEy/tmax;
AvgLEz = AvgLEz/tmax;
AvgLE = AvgLE/tmax;
cout << "AvgLE: " << AvgLE << "\n";
cout << time_step[tmax] << " LEx " << LEx[tmax] << " LEy " << LEy[tmax] << " LEz: " << LEz[tmax] << " LE: " << LE[tmax] << endl;
cout << "\n";
infile.close();
cout << "\nFile read! \n";
// FUNCTIONS
for (int i = 0; i < tmax+1; ++i)
{
Function[i] = LE[i] - AvgLE;
cout << LE[i] << " " << AvgLE << " " << Function[i] << "\n";
}
cout << "\n Function set up \n";
// // CONVOLUTION
float A_t;
float A_dt;
int dt, t;
float max=0;
for (int dt = 0; dt < 1 + tmax/10; ++dt)
{
//dt = 1;
cout << "dt: " << dt << "\n";
//A_dt = 0;
int window = tmax-dt;
for (int t = 0; t < window; ++t)
{
if (t==0)
{
A_t = 0;
cout << "A_t was set to zero \n";
}
A_t+= Function[t] * Function[t + dt];
cout << "A_t is accumulation ...: " << A_t << "\n" ;
}
A_dt = A_t / window;
if (A_dt > max)
{
max = A_dt;
}
// OUTPUT
ofstream AutoCor;
AutoCor.open ("AutoCor_25.prn", ios::app);
AutoCor << dt << " " << A_dt << "\n";
AutoCor.close();
cout << "A_dt: " << A_dt << " " << "window used: " << window << "\n";
}
cout << "max used: " << max << "\n";
// RETURN
return A_dt;
return 0;
}
3 Answers 3
Creating Filenames
If you know the naming format, you can use std::ostringstream
or snprintf
to create a filename:
char filename[40];
for (unsigned int i = 0; i < 100; i++)
{
snprintf(filename, sizeof(filename),
"file_%04d.prn");
std::ifstream infile(filename);
//...
}
Another, platform specific, method is to iterate over each file. One algorithm is:
filename = Get_First_Filename();
- process file
filename = Get_Next_Filename();
if (filename != NULL)
goto to step 2
An array of structures vs. many arrays of variables
Instead of declaring many arrays like this:
float LEx[tmax];
float LEy[tmax];
float LEz[tmax];
float LE[tmax];
You may want to put them into a structure (record) and use an array of the structures:
struct Record
{
float LEx;
float LEy;
float LEz;
float LE;
};
std::vector< Record > data(tmax);
You may find that this change will speed up your program because all the relevant variables are next to each other instead of being separated by tmax
elements.
Use double
not float
In your programs prefer the double
type over float
. The double
is more accurate and most floating point libraries are designed for double
. The float
should be used in constrained environments where memory space is restricted.
Read all data, then process
In your program, you read a record of data then process it:
for (int i = 0; i < tmax+1; ++i)
{
infile >> time_step[i] >> LEx[i] >> LEy[i] >> LEz[i] >> LE[i];
AvgLEx = AvgLEx + LEx[i];
AvgLEy = AvgLEy + LEy[i];
AvgLEz = AvgLEz + LEz[i];
AvgLE = AvgLE + LE[i];
}
Most platforms are most efficient when the reading of data is continuous. This allows the hard drive to keep spinning rather than spinning down and stopping after a read, then spinning up again for the next read (yes, modern drives use caches to smooth this).
A more efficient method is to read all or most of your data, then process it:
struct Input_Record
{
double time_step;
double LEx;
double LEy;
double LEz;
};
std::vector< Input_Record > input_data(tmax);
for (unsigned int i = 0; i < tmax; ++i)
{
infile >> input_data[i].time_step
>> input_data[i].LEx
>> input_data[i].LEy
>> input_data[i].LEz;
}
for (unsigned int i = 0; i < tmax; i++)
{
AvgLEx += input_data[i].LEx;
AvgLEy += input_data[i].LEy;
AvgLEz += input_data[i].LEz;
}
-
\$\begingroup\$ wow. So cool. This is such a clearer way to understand what is going on 'behind the scenes'. The use of structures is MOST definitely going to be helpful for me. \$\endgroup\$Kitt Kat– Kitt Kat2014年07月14日 21:35:26 +00:00Commented Jul 14, 2014 at 21:35
-
\$\begingroup\$ I think that for the file part - I do know the file format because I create it myself in TCL to store the data when I run/call my simulation program. So if I understand what you wrote above correctly in the following -- char filename[40]; for (unsigned int i = 0; i < 100; i++) { snprintf(filename, sizeof(filename), "file_%04d.prn"); std::ifstream infile(filename); //... } .. you are using the "%04d" part to add the '1', '2', '3' ..etc changes to the file names so that it is then possible to loop over them and collect the data from each file? \$\endgroup\$Kitt Kat– Kitt Kat2014年07月14日 21:37:40 +00:00Commented Jul 14, 2014 at 21:37
-
1\$\begingroup\$ Yes, the
snprint
function is used to append numbers to a file name. So the first filename is "file_0000.prn", the second is "file_0001.prn", and so on. \$\endgroup\$Thomas Matthews– Thomas Matthews2014年07月15日 00:43:48 +00:00Commented Jul 15, 2014 at 0:43
As you're new to C++, you should be aware that implementing an entire program in main()
can make things difficult, especially when the code is already complex.
As it appears, you have declare and initialize variables at the top, open the file, perform the operations, then output things. Even with these comments in place, you should still split this up into functions.
Do not maintain a list of variables as it could make maintenance more difficult, such as if variables are no longer being used. Instead. you should only declare or initialize variables as close to their intended use as possible. That way, if you no longer use that variable, you won't have to search for it at the top. This will also make your code easier to follow.
Consider just having
main()
do two things: attempt to open the file and call all the necessary functions afterwards. You can also open the file in an other function, but if you do it inmain()
, then you can easily return from there if the file cannot be opened.Everything else, especially the file operations, should be in separate functions. Also keep in mind that a function should just have one primary purpose, in order to maintain the Single Responsible Principle (SRP). If a function needs to do something else that another function already does, then it should call that function. You should also name functions, in the verb form, based on what they're supposed to do. This will make the function's intent very clear to others.
Example:
void openFile() {}
Miscellaneous:
Try not to use
using namespace std
in global scope as it could cause name-clashing bugs, especially in larger programs. See this post for more info on this.This variable appears to be a constant:
int tmax = 80000;
If so, initialize it with
const
so that it cannot be modified:const int tmax = 80000;
For this approach at checking the file:
infile.open("File_1.prn"); if(infile.fail()) { cout << "error reading file" << endl; return 1; }
you should use
std::ifstream
, check for!infile()
, print the error message tostd::cerr
, and then returnEXIT_FAILURE
:std::ifstream infile("File_1.prn"); if (!infile) { std::cerr << "error reading file"; return EXIT_FAILURE; }
-
\$\begingroup\$ ok I will make those changes now and then repost a clearer code \$\endgroup\$Kitt Kat– Kitt Kat2014年07月14日 19:44:24 +00:00Commented Jul 14, 2014 at 19:44
-
\$\begingroup\$ @Sarah: You may still wait for more answers. This is mostly a general review, and there may be more issues here. I haven't looked at every bit yet. \$\endgroup\$Jamal– Jamal2014年07月14日 19:45:45 +00:00Commented Jul 14, 2014 at 19:45
-
\$\begingroup\$ oh ok! I will check back then in a bit \$\endgroup\$Kitt Kat– Kitt Kat2014年07月14日 19:51:50 +00:00Commented Jul 14, 2014 at 19:51
Your current code looks buggy to me. The first (and most obvious) one that I saw was:
for (int i = 0; i < tmax+1; ++i)
...then using i
as an index into arrays defined as xxx[tmax]
. With tmax
as the size of an array, the allowable indices are 0
through tmax-1
. Trying to read/write xxx[tmax]
and xxx[tmax+1]
leads to undefined behavior.
Right now, the code also blindly assumes that each input file contains a pre-set number of rows of data (but the intent seems to be that it will then produce a file for which that assumption is false).
Instead of defining a (fixed-size) array and basing your assumptions on each file containing a fixed number of items, I'd prefer to use std::vector
, and read exactly as many items as the file contains.
I agree with @Thomas Matthews' advice to create a struct to hold a single record, and create a vector (he said array, but see above) of those records. I'd go one more step though: I'd overload operator>>
(and possibly also operator<<
) for that record type to make reading the records convenient and easy as well.
class record {
float time_step;
float LEx;
float LEy;
float LEz;
float LE;
friend std::istream &operator>>(std::istream &is, record &r) {
return is >> time_step >> r.LEx >> r.LEy >> r.LEz >> r.LE;
}
};
Using this, reading one file of data into a vector of records can look like this:
std::vector<record> records{std::istream_iterator<record>(infile),
std::istream_iterator<record>()};
In your case (reading multiple files into a single vector) it's marginally less convenient, but only marginally so:
std::vector<record> records;
for (std::string const &filename : filenames) {
std::istream infile(filename);
std::copy(std::istream_iterator<record>(infile),
std::istream_iterator<record>(),
std::back_inserter(records));
}
While it's possible to have the program synthesize the file names, it's generally easier to supply them on the command line, possibly including a wildcard, something like:
process File_*.prn
In this case, your program can simply use argc
/argv
that are passed to main to process the file names:
int main(int argc, char **argv) {
std::vector<std::string> filenames(argv+1, argv+argc-1);
On Unix, passing a wildcard on the command line will normally be handled automatically by the shell. On Windows (at least with MSVC) you can link your file with setargv.obj
to get the same effect: cl myfile.cpp setargv.obj
.
Alternatively, you could have your code enumerate the files fitting a specified pattern. If you're going to do this, I'd recommend using the Boost FileSystem library, so you can do the job portably across most major OSes--and better still, it's being used as the basis for work on an addition to the standard library, so the code you write now will probably work with the future standard library with little or no extra work.
Although you didn't ask about them, I'd note that your FUNCTIONS
and CONVOLUTION
sections could be cleaned up quite a bit as well. It looks like std::accumulate
could make computing averages quite a bit cleaner, and your:
A_t+= Function[t] * Function[t + dt];
...looks like it could be implemented rather more cleanly with std::inner_product
:
A_t = std::inner_product(Function.begin(), Function.begin()+window, Function.begin()+dt, 0.0);
(Note that this replaces the entire loop to calculate A_t
, not just the one line quoted above).
I guess I'll stop there (at least for now) since it's not really clear how much you care about cleaning up this part of the code.
-
\$\begingroup\$ No that is great ^^^ !!. Although I was mostly interested (desperate?) to find a solution to the file manipulation .. all these extra comments are so very much appreciated AND useful. I figured there was a lot about C++ syntax that I am clearly not exploiting/using/aware of and that was frustrating to me while I was writing the code initially. So thank you @JerryCoffin \$\endgroup\$Kitt Kat– Kitt Kat2014年07月15日 17:22:31 +00:00Commented Jul 15, 2014 at 17:22
cat
to do this. \$\endgroup\$cat file* > all.file
orcat file{1..3} > file4
\$\endgroup\$