Simple neural-network simulation in C++ (Round 2)

Question 1

Intro

Yesterday I posted this question. Since then, I've updated my code to incorporate these suggestions. I've also removed the dependence on C++11. Finally, I've made the following changes that get me closer to my overall goal:

Rather than iterate over different values for dt within my script, I have dt specified on the command line. Specifically, an integer is specified on the command line that corresponds to (1 +) an index in dt_array. This allows me to process different values of dt in parallel using the Sun Grid Engine.
Rather than use a single value for I_syn_bar, I now iterate over 100 values of I_syn_bar.

If you read through the current state of my script below, you'll see that I'm writing to disk 100 text files per dt. When I set n_x to 2 instead of 100, the script is very fast: 6 s on my machine. But when I set n_x to 100, and submit the script as a job to the SGE, it takes ~1 hour to complete (more than 6 s * 50). Hence, there seems to be some penalty being imposed on me for the heavy file I/O I'm using (in addition to the general SGE overhead).

My goal now is to change the code so that I'm writing the data for all 100 values of I_syn_bar, but in fewer files. I have a 2D matrix for each value of I_syn_bar. In order to write data for multiple values of I_syn_bar to the same text file, I need a 3D object of some kind (and a strategy for writing this object to file). Another constraint I have is that I need these files to be able to be read into Python.

Code

#include <math.h>
#include <vector>
#include <string>
#include <fstream>
#include <iostream>
#include <iterator>
#include <Eigen/Dense>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdlib.h>
#include <sstream>
using Eigen::MatrixXd;
using Eigen::ArrayXd;
bool save_mat(const MatrixXd& pdata, const std::stringstream& file_path)
{
 std::ofstream os(file_path.str().c_str());
 if (!os.is_open())
 {
 std::cout << "Failure!" << std::endl;
 return false;
 }
 os.precision(11);
 const int n_rows = pdata.rows();
 const int n_cols = pdata.cols();
 for (int i = 0; i < n_rows; i++)
 {
 for (int j = 0; j < n_cols; j++)
 {
 os << pdata(i, j);
 if (j + 1 == n_cols)
 {
 os << std::endl;
 }
 else
 {
 os << ",";
 }
 }
 }
 os.close();
 return true;
}
std::string get_save_file()
{
 std::string dan_dir;
 struct stat statbuf;
 if (stat("/home/daniel", &statbuf) == 0 && S_ISDIR(statbuf.st_mode))
 {
 dan_dir = "/home/daniel/Science";
 }
 else if (stat("/home/dan", &statbuf) == 0 && S_ISDIR(statbuf.st_mode))
 {
 dan_dir = "/home/dan/Science";
 }
 else if (stat("/home/despo", &statbuf) == 0 && S_ISDIR(statbuf.st_mode))
 {
 dan_dir = "/home/despo/dbliss";
 }
 std::string save_file = "/dopa_net/results/hansel/test/test_hansel";
 save_file = dan_dir + save_file;
 return save_file;
}
double f(const double t, const double tau_1, const double tau_2)
{
 return tau_2 / (tau_1 - tau_2) * (exp(-t / tau_1) - exp(-t / tau_2));
}
ArrayXd set_initial_V(const double tau, const double g_L, const double I_0,
 const double theta, const double V_L, const int N,
 const double c)
{
 const double T = -tau * log(1 - g_L / I_0 * (theta - V_L));
 ArrayXd V(N);
 for (int i = 0; i < N; i++)
 {
 V(i) = V_L + I_0 / g_L * (1 - exp(-c * (i - 1) / N * T / tau));
 }
 return V;
}
int main(int argc, char *argv[])
{
 // Declare variables set inside loops below.
 double t;
 double I_syn_bar;
 int i;
 std::stringstream complete_save_file;
 // Declare and initialize constant parameters.
 const int n_x = 100;
 const double x_min = 0; // uA / cm^2.
 const double x_max = 1; // uA / cm^2.
 const double x_step = (x_max - x_min) / (n_x - 1); // uA / cm^2.
 const double tau_1 = 3.0; // ms.
 const double tau_2 = 1.0; // ms.
 const int N = 128;
 const double dt_array[3] = {0.25, 0.1, 0.01}; // ms.
 const char* task_id = argv[argc - 1];
 const int task_id_int = task_id[0] - '0';
 const double dt = dt_array[task_id_int - 1];
 const double tau = 10; // ms.
 const double g_L = 0.1; // mS / cm^2.
 const double I_0 = 2.3; // uA / cm^2.
 const double theta = -40; // mV.
 const double V_L = -60; // mV.
 const double c = 0.5;
 const double C = 1; // uF / cm^2.
 const int sim_t = 10000; // ms.
 const int n_t = sim_t / dt;
 const std::string save_file = get_save_file();
 // Save V for each I_syn_bar, for the dt specified on the command line.
 for (double I_syn_bar = x_min; I_syn_bar < x_max; I_syn_bar += x_step)
 {
 MatrixXd V(N, n_t);
 V.col(0) = set_initial_V(tau, g_L, I_0, theta, V_L, N, c);
 double I_syn = 0; // uA / cm^2.
 ArrayXd t_spike_array = ArrayXd::Zero(N);
 i = 1;
 for (double t = dt; t < sim_t; t += dt)
 {
 ArrayXd prev_V = V.col(i - 1).array();
 ArrayXd current_V = prev_V + dt * (-g_L * (prev_V - V_L) + I_syn +
 I_0) / C;
 V.col(i) = current_V;
 I_syn = 0;
 for (int j = 0; j < N; j++)
 {
 if (current_V(j) > theta)
 {
 t_spike_array(j) = t;
 V(j, i) = V_L;
 }
 I_syn += I_syn_bar / N * f(t - t_spike_array(j), tau_1, tau_2);
 }
 i++;
 }
 complete_save_file << save_file << dt << "_" << I_syn_bar << ".txt";
 save_mat(V, complete_save_file);
 complete_save_file.str("");
 complete_save_file.clear();
 }
 return 0;
}

Timing Information

---------------------------------------------
| n_x | command-line arg | SGE? | Time |
---------------------------------------------
| 2 | 1 | no | 6 s |
---------------------------------------------
| 2 | 1 | yes | 30 s |
---------------------------------------------
| 100 | 1 | no | 10 m 16 s |
---------------------------------------------
| 100 | 1 | yes | 53 m 5 s |
---------------------------------------------

Question 2

Well, in the save-code you have some superfluous flushes. I suggest removing them, but that's all I see just now. BTW, why make it C++03 compatible? (Just because you can is fine, though I'm curious.)

Question 3

The stream-manipulator std::endl doesn't only cause a single '\n' to be inserted, but causes an explicit flush afterwards. If the latter isn't needed, that's just tthrowing away your performance.

Question 4

The code you posted run an infinite loop due to the computing of x_step. You are using an integer division resulting in x_step=0 with the parameters you have posted. Pleae make sure that the code you posted is an actual working example.

Question 5

note that n_x must be an integer greater than or equal to 2.

Question 6

If you skim the relevant part of the paper which this analysis is attempting to replicate (cf. the link in the Round 1 question), you'll see that my code saves an intermediate result on the way to the final result that is plotted in Figure 1. Figure 1 plots sigma_N, whereas I'm saving V here.

For each I_syn_bar, V is a matrix of size number of neurons by number of time points. This is a lot of data to save. sigma_N takes up much less space: a single double for each I_syn_bar.

I made two fairly small changes to my code:

I followed the advice in the comments and replaced all appearances of endl with "\n".
I carried out the analysis through the computation of sigma_N and saved sigma_N instead of V.

sigma_N, in my new code, is defined as follows (within, but at the very end of, the loop over I_syn_bar):

 // Compute A_N.
 A_N = V.colwise().mean();
 // Compute delta_N.
 delta_N = A_N.square().mean() - pow(A_N.mean(), 2);
 // Compute delta.
 V_squared = V.unaryExpr(std::ptr_fun(square));
 V_squared_time_mean = V_squared.rowwise().mean();
 V_time_mean = V.rowwise().mean();
 V_time_mean_squared = V_time_mean.square();
 delta = (V_squared_time_mean - V_time_mean_squared).mean();
 // Compute sigma_N.
 sigma_N_array[j] = delta_N / delta;

The Crux of this Answer

Let me restate this: In the code in the Round 2 question above, I'm saving 100 matrices of size N X n_t to disk, where N is 128 and n_t is 40000 (if dt is 0.25). As I mention in the question, all this file-writing is causing the code to be undesirably slow.

MY SOLUTION HERE is to carry the analysis a step farther. Rather than save all these matrices, I save the variable they will eventually be used to compute: sigma_N. Whereas V (what was originally saved) is 100 X 128 X 40000, sigma_N is 100 X 1.

Although the extra analysis step adds a tiny amount of time, this is far outweighed by the time saved by not having to write all those large matrices.

If you don't understand what sigma_N is, please, as I said, consult the earlier question and the paper linked there. If you are unwilling to do this, or otherwise don't understand what I'm saying, please ask me to clarify a particular point. I am happy to do that.

Timing Information

----------------------------------------------
| n_x | command-line arg | SGE? | Time |
----------------------------------------------
| 2 | 1 | no | 0.4 s |
----------------------------------------------
| 2 | 1 | yes | 1.2 s |
----------------------------------------------
| 100 | 1 | no | 41.4 s |
----------------------------------------------
| 100 | 1 | yes | 49.1 s |
----------------------------------------------
| 100 | 1-3 (in parallel) | yes | 18 m 28 s |
----------------------------------------------

Note, with regard to the last line of the table above, that the code is dramatically slower with 3 as the command-line argument (dt = 0.01) than with 2 or 1.

Caveat

Everything about the Round 1 question, the Round 1 answer, the Round 2 question, and this answer is great and wonderful when it comes to considering factors that influence the speed of a C++ script.

However, it turns out that this script overlooks a number of considerations that are needed in order to replicate Figure 1. I've made tweaks to the script in a few places to handle this. Since these considerations are outside the scope of both this question and the Round 1 question, I'm not going to describe them here.

Question 7

Comments are not for extended discussion; this conversation has been moved to chat.

abcd abcd 3291 silver badge11 bronze badges · Answer 1 · 2015-10-17 04:33:48Z

If you skim the relevant part of the paper which this analysis is attempting to replicate (cf. the link in the Round 1 question), you'll see that my code saves an intermediate result on the way to the final result that is plotted in Figure 1. Figure 1 plots sigma_N, whereas I'm saving V here.

For each I_syn_bar, V is a matrix of size number of neurons by number of time points. This is a lot of data to save. sigma_N takes up much less space: a single double for each I_syn_bar.

I made two fairly small changes to my code:

I followed the advice in the comments and replaced all appearances of endl with "\n".
I carried out the analysis through the computation of sigma_N and saved sigma_N instead of V.

sigma_N, in my new code, is defined as follows (within, but at the very end of, the loop over I_syn_bar):

 // Compute A_N.
 A_N = V.colwise().mean();
 // Compute delta_N.
 delta_N = A_N.square().mean() - pow(A_N.mean(), 2);
 // Compute delta.
 V_squared = V.unaryExpr(std::ptr_fun(square));
 V_squared_time_mean = V_squared.rowwise().mean();
 V_time_mean = V.rowwise().mean();
 V_time_mean_squared = V_time_mean.square();
 delta = (V_squared_time_mean - V_time_mean_squared).mean();
 // Compute sigma_N.
 sigma_N_array[j] = delta_N / delta;

The Crux of this Answer

Let me restate this: In the code in the Round 2 question above, I'm saving 100 matrices of size N X n_t to disk, where N is 128 and n_t is 40000 (if dt is 0.25). As I mention in the question, all this file-writing is causing the code to be undesirably slow.

MY SOLUTION HERE is to carry the analysis a step farther. Rather than save all these matrices, I save the variable they will eventually be used to compute: sigma_N. Whereas V (what was originally saved) is 100 X 128 X 40000, sigma_N is 100 X 1.

Although the extra analysis step adds a tiny amount of time, this is far outweighed by the time saved by not having to write all those large matrices.

If you don't understand what sigma_N is, please, as I said, consult the earlier question and the paper linked there. If you are unwilling to do this, or otherwise don't understand what I'm saying, please ask me to clarify a particular point. I am happy to do that.

Timing Information

----------------------------------------------
| n_x | command-line arg | SGE? | Time |
----------------------------------------------
| 2 | 1 | no | 0.4 s |
----------------------------------------------
| 2 | 1 | yes | 1.2 s |
----------------------------------------------
| 100 | 1 | no | 41.4 s |
----------------------------------------------
| 100 | 1 | yes | 49.1 s |
----------------------------------------------
| 100 | 1-3 (in parallel) | yes | 18 m 28 s |
----------------------------------------------

Note, with regard to the last line of the table above, that the code is dramatically slower with 3 as the command-line argument (dt = 0.01) than with 2 or 1.

Caveat

Everything about the Round 1 question, the Round 1 answer, the Round 2 question, and this answer is great and wonderful when it comes to considering factors that influence the speed of a C++ script.

However, it turns out that this script overlooks a number of considerations that are needed in order to replicate Figure 1. I've made tweaks to the script in a few places to handle this. Since these considerations are outside the scope of both this question and the Round 1 question, I'm not going to describe them here.

Comments are not for extended discussion; this conversation has been moved to chat.

Stack Exchange Network

Simple neural-network simulation in C++ (Round 2)

Intro

Code

Timing Information

1 Answer 1

The Crux of this Answer

Timing Information

Caveat

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Simple neural-network simulation in C++ (Round 2)

Intro

Code

Timing Information

1 Answer 1

The Crux of this Answer

Timing Information

Caveat

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions