Intro
Yesterday I posted this question. Since then, I've updated my code to incorporate these suggestions. I've also removed the dependence on C++11. Finally, I've made the following changes that get me closer to my overall goal:
Rather than iterate over different values for
dt
within my script, I havedt
specified on the command line. Specifically, an integer is specified on the command line that corresponds to (1 +) an index indt_array
. This allows me to process different values ofdt
in parallel using the Sun Grid Engine.Rather than use a single value for
I_syn_bar
, I now iterate over 100 values ofI_syn_bar
.
If you read through the current state of my script below, you'll see that I'm writing to disk 100 text files per dt
. When I set n_x
to 2 instead of 100, the script is very fast: 6 s on my machine. But when I set n_x
to 100, and submit the script as a job to the SGE, it takes ~1 hour to complete (more than 6 s * 50). Hence, there seems to be some penalty being imposed on me for the heavy file I/O I'm using (in addition to the general SGE overhead).
My goal now is to change the code so that I'm writing the data for all 100 values of I_syn_bar
, but in fewer files. I have a 2D matrix for each value of I_syn_bar
. In order to write data for multiple values of I_syn_bar
to the same text file, I need a 3D object of some kind (and a strategy for writing this object to file). Another constraint I have is that I need these files to be able to be read into Python.
Code
#include <math.h>
#include <vector>
#include <string>
#include <fstream>
#include <iostream>
#include <iterator>
#include <Eigen/Dense>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdlib.h>
#include <sstream>
using Eigen::MatrixXd;
using Eigen::ArrayXd;
bool save_mat(const MatrixXd& pdata, const std::stringstream& file_path)
{
std::ofstream os(file_path.str().c_str());
if (!os.is_open())
{
std::cout << "Failure!" << std::endl;
return false;
}
os.precision(11);
const int n_rows = pdata.rows();
const int n_cols = pdata.cols();
for (int i = 0; i < n_rows; i++)
{
for (int j = 0; j < n_cols; j++)
{
os << pdata(i, j);
if (j + 1 == n_cols)
{
os << std::endl;
}
else
{
os << ",";
}
}
}
os.close();
return true;
}
std::string get_save_file()
{
std::string dan_dir;
struct stat statbuf;
if (stat("/home/daniel", &statbuf) == 0 && S_ISDIR(statbuf.st_mode))
{
dan_dir = "/home/daniel/Science";
}
else if (stat("/home/dan", &statbuf) == 0 && S_ISDIR(statbuf.st_mode))
{
dan_dir = "/home/dan/Science";
}
else if (stat("/home/despo", &statbuf) == 0 && S_ISDIR(statbuf.st_mode))
{
dan_dir = "/home/despo/dbliss";
}
std::string save_file = "/dopa_net/results/hansel/test/test_hansel";
save_file = dan_dir + save_file;
return save_file;
}
double f(const double t, const double tau_1, const double tau_2)
{
return tau_2 / (tau_1 - tau_2) * (exp(-t / tau_1) - exp(-t / tau_2));
}
ArrayXd set_initial_V(const double tau, const double g_L, const double I_0,
const double theta, const double V_L, const int N,
const double c)
{
const double T = -tau * log(1 - g_L / I_0 * (theta - V_L));
ArrayXd V(N);
for (int i = 0; i < N; i++)
{
V(i) = V_L + I_0 / g_L * (1 - exp(-c * (i - 1) / N * T / tau));
}
return V;
}
int main(int argc, char *argv[])
{
// Declare variables set inside loops below.
double t;
double I_syn_bar;
int i;
std::stringstream complete_save_file;
// Declare and initialize constant parameters.
const int n_x = 100;
const double x_min = 0; // uA / cm^2.
const double x_max = 1; // uA / cm^2.
const double x_step = (x_max - x_min) / (n_x - 1); // uA / cm^2.
const double tau_1 = 3.0; // ms.
const double tau_2 = 1.0; // ms.
const int N = 128;
const double dt_array[3] = {0.25, 0.1, 0.01}; // ms.
const char* task_id = argv[argc - 1];
const int task_id_int = task_id[0] - '0';
const double dt = dt_array[task_id_int - 1];
const double tau = 10; // ms.
const double g_L = 0.1; // mS / cm^2.
const double I_0 = 2.3; // uA / cm^2.
const double theta = -40; // mV.
const double V_L = -60; // mV.
const double c = 0.5;
const double C = 1; // uF / cm^2.
const int sim_t = 10000; // ms.
const int n_t = sim_t / dt;
const std::string save_file = get_save_file();
// Save V for each I_syn_bar, for the dt specified on the command line.
for (double I_syn_bar = x_min; I_syn_bar < x_max; I_syn_bar += x_step)
{
MatrixXd V(N, n_t);
V.col(0) = set_initial_V(tau, g_L, I_0, theta, V_L, N, c);
double I_syn = 0; // uA / cm^2.
ArrayXd t_spike_array = ArrayXd::Zero(N);
i = 1;
for (double t = dt; t < sim_t; t += dt)
{
ArrayXd prev_V = V.col(i - 1).array();
ArrayXd current_V = prev_V + dt * (-g_L * (prev_V - V_L) + I_syn +
I_0) / C;
V.col(i) = current_V;
I_syn = 0;
for (int j = 0; j < N; j++)
{
if (current_V(j) > theta)
{
t_spike_array(j) = t;
V(j, i) = V_L;
}
I_syn += I_syn_bar / N * f(t - t_spike_array(j), tau_1, tau_2);
}
i++;
}
complete_save_file << save_file << dt << "_" << I_syn_bar << ".txt";
save_mat(V, complete_save_file);
complete_save_file.str("");
complete_save_file.clear();
}
return 0;
}
Timing Information
---------------------------------------------
| n_x | command-line arg | SGE? | Time |
---------------------------------------------
| 2 | 1 | no | 6 s |
---------------------------------------------
| 2 | 1 | yes | 30 s |
---------------------------------------------
| 100 | 1 | no | 10 m 16 s |
---------------------------------------------
| 100 | 1 | yes | 53 m 5 s |
---------------------------------------------
1 Answer 1
If you skim the relevant part of the paper which this analysis is attempting to replicate (cf. the link in the Round 1 question), you'll see that my code saves an intermediate result on the way to the final result that is plotted in Figure 1. Figure 1 plots sigma_N
, whereas I'm saving V
here.
For each I_syn_bar
, V
is a matrix of size number of neurons by number of time points. This is a lot of data to save. sigma_N
takes up much less space: a single double for each I_syn_bar
.
I made two fairly small changes to my code:
I followed the advice in the comments and replaced all appearances of
endl
with"\n"
.I carried out the analysis through the computation of
sigma_N
and savedsigma_N
instead ofV
.
sigma_N
, in my new code, is defined as follows (within, but at the very end of, the loop over I_syn_bar
):
// Compute A_N.
A_N = V.colwise().mean();
// Compute delta_N.
delta_N = A_N.square().mean() - pow(A_N.mean(), 2);
// Compute delta.
V_squared = V.unaryExpr(std::ptr_fun(square));
V_squared_time_mean = V_squared.rowwise().mean();
V_time_mean = V.rowwise().mean();
V_time_mean_squared = V_time_mean.square();
delta = (V_squared_time_mean - V_time_mean_squared).mean();
// Compute sigma_N.
sigma_N_array[j] = delta_N / delta;
The Crux of this Answer
Let me restate this: In the code in the Round 2 question above, I'm saving 100 matrices of size N
X n_t
to disk, where N
is 128 and n_t
is 40000 (if dt
is 0.25). As I mention in the question, all this file-writing is causing the code to be undesirably slow.
MY SOLUTION HERE is to carry the analysis a step farther. Rather than save all these matrices, I save the variable they will eventually be used to compute: sigma_N
. Whereas V
(what was originally saved) is 100 X 128 X 40000, sigma_N
is 100 X 1.
Although the extra analysis step adds a tiny amount of time, this is far outweighed by the time saved by not having to write all those large matrices.
If you don't understand what sigma_N
is, please, as I said, consult the earlier question and the paper linked there. If you are unwilling to do this, or otherwise don't understand what I'm saying, please ask me to clarify a particular point. I am happy to do that.
Timing Information
----------------------------------------------
| n_x | command-line arg | SGE? | Time |
----------------------------------------------
| 2 | 1 | no | 0.4 s |
----------------------------------------------
| 2 | 1 | yes | 1.2 s |
----------------------------------------------
| 100 | 1 | no | 41.4 s |
----------------------------------------------
| 100 | 1 | yes | 49.1 s |
----------------------------------------------
| 100 | 1-3 (in parallel) | yes | 18 m 28 s |
----------------------------------------------
Note, with regard to the last line of the table above, that the code is dramatically slower with 3
as the command-line argument (dt = 0.01
) than with 2
or 1
.
Caveat
Everything about the Round 1 question, the Round 1 answer, the Round 2 question, and this answer is great and wonderful when it comes to considering factors that influence the speed of a C++ script.
However, it turns out that this script overlooks a number of considerations that are needed in order to replicate Figure 1. I've made tweaks to the script in a few places to handle this. Since these considerations are outside the scope of both this question and the Round 1 question, I'm not going to describe them here.
-
\$\begingroup\$ Comments are not for extended discussion; this conversation has been moved to chat. \$\endgroup\$Jamal– Jamal2015年10月17日 15:13:56 +00:00Commented Oct 17, 2015 at 15:13
Explore related questions
See similar questions with these tags.
std::endl
doesn't only cause a single'\n'
to be inserted, but causes an explicit flush afterwards. If the latter isn't needed, that's just tthrowing away your performance. \$\endgroup\$x_step
. You are using an integer division resulting inx_step=0
with the parameters you have posted. Pleae make sure that the code you posted is an actual working example. \$\endgroup\$n_x
must be an integer greater than or equal to 2. \$\endgroup\$