I'm currently working on a project1 which will result in millions of FFTs being solved time and time again. I have a finished prototype in Python (using numpy, scipy, and the pyfftw3 package), but there are significant bottlenecks in calculation speed. I have been writing a GPU version using arrayfire, but there's still a large amount of number crunching which can't be done in parallel. I'm hoping that translating the project to C++, either the entire thing or just the compute-intensive parts, will provide an extra boost over Python where the GPU isn't appropriate.
As a starting point, I have written a very basic C++ script which defines a Gaussian curve, takes the FFT using the FFTW3 library, and plots the input curve with its FFT using gnuplot. As this is my first script in C++ (neglecting 'hello world') I'd like to make sure I'm not wandering severely off track when it comes to optimizing efficiency and speed. I haven't benchmarked this yet, but was hoping that someone with more experience in using these libraries and the language could scan over it and point out any howlers.
The code:
#include "gnuplot-iostream.h"
#include <iostream>
#include <vector>
#include <cmath>
#include <fftw3.h>
#include <complex>
typedef std::complex<double> doubleComp;
int main()
{
Gnuplot gp;
doubleComp i;
i = -1;
i = sqrt(i);
int gridOrder = 9;
int nPoints = pow(2, gridOrder);
int halfPoints = pow(2, gridOrder-1);
float tau = 18;
float inTemp, outTemp;
fftw_complex in[nPoints], out[nPoints], pArr1[nPoints], pArr2[nPoints];
fftw_plan p;
for (int idx=0; idx<nPoints; idx++) {
in[idx][0] = exp(-pow((idx-halfPoints)/tau, 2));
in[idx][1] = 0;
}
p = fftw_plan_dft_1d(nPoints, pArr1, pArr2, FFTW_FORWARD, FFTW_MEASURE);
fftw_execute_dft(p, in, out);
for (int idx=0; idx<nPoints; idx++) {
out[idx][0] *= (1/double(nPoints));
out[idx][1] *= (1/double(nPoints));
}
std::vector<std::pair<double, double> >inPlot;
std::vector<std::pair<double, double> >outPlot;
for (int idx=0; idx<nPoints; idx++) {
inTemp = std::abs(in[idx][0] + i*in[idx][1]);
// Deal with fftshifts needed for visuals:
if (idx > halfPoints)
outTemp = std::abs(out[idx-halfPoints][0] +
i*out[idx-halfPoints][1]);
if (idx < halfPoints)
outTemp = std::abs(out[idx+halfPoints][0] +
i*out[idx+halfPoints][1]);
inPlot.push_back(std::make_pair(idx-halfPoints, inTemp));
outPlot.push_back(std::make_pair(idx-halfPoints, outTemp));
}
gp << "set xlabel 'Grid points' font 'Helvetica, 12'\n" <<
"set ylabel 'Intensity' font 'Helvetica, 12'" << std::endl;
gp << "set tics font 'Helvetica, 10'" << std::endl;
gp << "set title 'Gaussian pulse example' font 'Helvetica, 13'" << std::endl;
gp << "set grid xtics lt 0 lw 1 lc 'gray'\n" <<
"set grid ytics lt 0 lw 2 lc 'gray'" << std::endl;
gp << "set term wxt '0'" << std::endl;
gp << "plot '-' lt 1 lw 2 linecolor 'blue' with line title 'Time domain'\n";
gp.send1d(inPlot);
gp << "set term wxt '1'" << std::endl;
gp << "plot '-' lt 11 lw 2 with line title"
"'Spectral domain'\n";
gp.send1d(outPlot);
fftw_destroy_plan(p);
return 0;
}
To compile (Ubuntu 16.04):
g++ -O3 GaussianPulseDefinition.cpp -o GaussianPulseDefinition -lboost_system -lboost_iostreams -lboost_filesystem -lfftw3 -lm
1 For those who like to know the application: The project is basically a coupled PDE solver (with Qt GUI) aimed at simulating the nonlinear dynamics of optical pulse propagation in optical fibre, fibre laser systems, optical amplifiers, etc.
-
2\$\begingroup\$ This is quite a step up from "Hello world"! \$\endgroup\$200_success– 200_success2017年02月07日 12:59:53 +00:00Commented Feb 7, 2017 at 12:59
-
\$\begingroup\$ Ha! Lucky enough to have come across an excellent tutorial which I've been reading religiously. Also, the packages I'm using here have nearly direct counterparts in python and MATLAB, which I use daily. Even so, this would take 15mins for me in python, but took >2 weeks in c++... \$\endgroup\$user129626– user1296262017年02月07日 14:07:36 +00:00Commented Feb 7, 2017 at 14:07
2 Answers 2
Firstly, does the program give correct results?
If so, how do you know?
Without going into extensive unit testing, checking your results against a Matlab prototype/your Python implementation is probably good enough for now.
I can't comment on the Gnuplot functions as I don't have it installed, also but some general C++ tips are...
std::vector
s are your friend! Try use them instead of C-style arrays. They 'know' their size and accessing out-of-bounds values are a major problem with C-style arrays and definitely for fiddly calculations like FFT. (You can usemy_vector.data()
to pass it into C-style functions).- Reduce reliance on a 'global'
nPoints
constant by using the vectors.size()
member. (Unfortunately this outputs an unsigned integersize_t
so there are some casts in the code to silence the compiler warnings). This is so that ifnPoints
goes out of sync with the size of the array/vector, you're not going to access an out-of-bounds value. - Always try to define a variable's value as soon as you declare it. Undefined variables are a major problem as C++ doesn't (usually) auto-define say, a double to zero. (
double x;
would refer to some garbage value in memory). - Always try to define variables when you need them i.e. as late as possible (see code for examples)
- Try get into the habit of spitting out little functions right from the get-go, instead of one big block of code. It will break the problem up and be easier to debug. Also this will limit some variables scope so its easier to see what parts of the program change what variables.
- For FFTs probably best to stick to
double
s all round for the extra precision (removed thefloat
s) - Use C++11 if you can and turn up the compiler warning level
g++ -std=c++11 -Wall ...
Hopefully a bit to get you started!
I whole heartedly recommend this book if you're coming from Python, it's a short in length but really comprehensive and does a much better job than I could at explaining how to write good C++, and by the man himself - Stroustrup's "A Tour Of C++"
//#include "gnuplot-iostream.h"
#include <iostream>
#include <vector>
#include <cmath>
#include <fftw3.h>
#include <complex>
#include <cassert>
using doublePairs = std::vector<std::pair<double, double> >; // using (instead of typedef) for long named vector of pairs
void makeGaussian(std::vector<fftw_complex>& v, double tau) { // pass-by-reference indicates v is going to be altered
for (size_t idx=0; idx<v.size(); idx++) { // in.size() so we don't go out of range
const int halfPoints = v.size() / 2;
v[idx][0] = exp(-pow((idx-halfPoints)/tau, 2));
v[idx][1] = 0;
}
}
void performFft(std::vector<fftw_complex>& in,
std::vector<fftw_complex>& out) {
fftw_plan p;
assert(in.size() == out.size()); // test same sizes
const int nPoints = static_cast<int> (in.size());
std::vector<fftw_complex> pArr1 (nPoints);
std::vector<fftw_complex> pArr2 (nPoints);
p = fftw_plan_dft_1d(nPoints, pArr1.data(), pArr2.data(), FFTW_FORWARD, FFTW_MEASURE);
fftw_execute_dft(p, in.data(), out.data()); // access raw pointers to vectors for c-style function call
fftw_destroy_plan(p);
for (auto& n : out) { // nicer c++11 range based for loop
n[0] *= (1/static_cast<double> (nPoints)); // static_cast<T> is preferred to c-style casts
n[1] *= (1/static_cast<double> (nPoints));
}
}
void makePlotPairArrays(const std::vector<fftw_complex>& in,
const std::vector<fftw_complex>& out,
doublePairs& inPlot,
doublePairs& outPlot) {
assert(in.size() == out.size()); // test same sizes
const std::complex<double> i {std::sqrt(-1)}; // typedef not necessary? (it's only used once)
const int halfPoints = in.size() / 2;
for (size_t idx=0; idx<in.size(); idx++) {
// these Temps should be doubles?? since it's a pair of doubles not floats?
// would lead to less precise FFTs
double inTemp = std::abs(in[idx][0] + i*in[idx][1]); // keep these local to the loop scope
// Deal with fftshifts needed for visuals:
double outTemp {0};
const int idx_ = static_cast<int> (idx); // for signed/unsigned compiler warnings
if (idx_ > halfPoints) // what about when idx == halfPoints ???
outTemp = std::abs(out[idx-halfPoints][0] + // try make this an if/else statement so
i*out[idx-halfPoints][1]); // at least one branch is always touched
if (idx_ < halfPoints)
outTemp = std::abs(out[idx+halfPoints][0] +
i*out[idx+halfPoints][1]);
inPlot.push_back(std::make_pair(idx-halfPoints, inTemp));
outPlot.push_back(std::make_pair(idx-halfPoints, outTemp));
}
}
int main()
{
constexpr int gridOrder = 9;
constexpr int nPoints = std::pow(2, gridOrder);
constexpr double tau = 18;
// 'halfPoints' not needed here by declaring in function's scope
// std::pow(2, gridOrder-1); BTW pow is heavy why not nPoints / 2?
// our fft arrays
std::vector<fftw_complex> in (nPoints);
std::vector<fftw_complex> out (nPoints);
makeGaussian(in, tau);
performFft(in, out);
doublePairs inPlot;
doublePairs outPlot;
makePlotPairArrays (in, out, inPlot, outPlot);
// ... do the GnuPlotting
return 0;
}
-
\$\begingroup\$ Great! Thanks a lot for taking the time to go through this. I've not had enough time to fully digest your edits (or even test the code you've given), but will be doing over the next couple of days. \$\endgroup\$user129626– user1296262017年02月07日 19:31:54 +00:00Commented Feb 7, 2017 at 19:31
-
\$\begingroup\$ No problem! I meant to spend 15 mins but ended up being over an hour! I hope someone else can add some more info on the FFTW usage (I do DSP but haven't used that library because of licensing). TBH Python is very nice for plotting (and you're comfortable with it) so maybe you could write your FFT stuff as C++ functions and hook into them with Boost.Python? Or I presume you've already tried pypi.python.org/pypi/pyFFTW ? \$\endgroup\$ternonlerwen– ternonlerwen2017年02月07日 22:56:10 +00:00Commented Feb 7, 2017 at 22:56
-
\$\begingroup\$ No! I hadn't tried boost.python! Will definitely look into it. My other plan was to just save everything to file then use matplotlib, but would much prefer to use the convenience of something like gnuplot if it works out. I've tried the python fftw library extensively and it's fantastic, just experimenting mainly to see if c++ will give me any speed benefits, and also just out of curiosity :-D thanks for spending the time going through it. \$\endgroup\$user129626– user1296262017年02月07日 23:15:40 +00:00Commented Feb 7, 2017 at 23:15
-
\$\begingroup\$ One thing I'm not sure about: Why is it recommended that variables are declared when needed (as late as possible)? From a readability perspective I personally prefer to have declarations in one block early on, at the start of each function and organised by type. Is there a particular reason why this is considered bad practise? \$\endgroup\$user129626– user1296262017年02月22日 10:39:51 +00:00Commented Feb 22, 2017 at 10:39
-
1\$\begingroup\$ @FeeJF it's probably okay to declare constants at the head of a scope (use
const
or C++11constexpr
). Especially if these are sort of 'parameters' for you to tweak things as you're testing. But the reason is for errors/debugging. It's good to declare in as small a scope as possible. Declaring in a small scope means that only things in that scope can change the variable. If you must have variables in larger scopes then declaring as late as possible means only things after that declaration can change the variable. \$\endgroup\$ternonlerwen– ternonlerwen2017年02月25日 09:26:53 +00:00Commented Feb 25, 2017 at 9:26
Using the excellent suggestions from ternonlerwen this code is hugely improved. Here are some minor tweaks that I have made to ternonlerwen's version:
const int idx_ = static_cast<int> (idx)
missing frommakeGaussian()
, which is now added. Usingidx-halfPoints
, withidx
having typesize_t
results in unexpected values ashalfPoints
is an integer constant.- I have separated the transform plan creation, execution, and plan destruction. The plan only needs to be calculated once for vectors of a given size, so there is no need to create and destroy a plan with every call to
FFTW
if more than one FFT is needed for a given data set (as is the case for my application). - For me, defining the imaginary unit
i
requires two lines. Simply havingstd::complex<double> i = std::sqrt(-1);
returnsnan
. idx
andidx_
were mixed up in the fftshift part ofmakePlotPairArrays()
, causing a similar problem as was seen in point 1. I have also made the suggested changes to the part ofmakePlotPairArrays()
which does the fftshift, as I had overlooked theidx=halfPoints
case in my first post.- I have passed
nPoints
as an argument to functions where it is needed, rather than recalculating it each time. - Additional code for saving the plots has been included at the bottom.
And I think that's pretty much it.
Code:
#include "gnuplot-iostream.h"
#include <iostream>
#include <vector>
#include <cmath>
#include <fftw3.h>
#include <complex>
#include <cassert>
using doublePairs = std::vector<std::pair<double, double> >;
void makeGaussian(std::vector<fftw_complex>& v,
double tau) {
for (size_t idx=0; idx<v.size(); idx++) {
const int halfPoints = v.size() / 2;
const int idx_ = static_cast<int> (idx);
v[idx][0] = exp(-pow((idx_-halfPoints)/tau, 2));
v[idx][1] = 0;
}
}
void planForwardFFT(fftw_plan& p,
int nPoints) {
std::vector<fftw_complex> pArr1 (nPoints);
std::vector<fftw_complex> pArr2 (nPoints);
p = fftw_plan_dft_1d(nPoints, pArr1.data(), pArr2.data(),
FFTW_FORWARD, FFTW_MEASURE);
}
void performFFT(std::vector<fftw_complex>& in,
std::vector<fftw_complex>& out,
fftw_plan p,
int nPoints) {
assert(in.size() == out.size());
fftw_execute_dft(p, in.data(), out.data());
for (auto& n : out) {
n[0] *= (1/static_cast<double> (nPoints));
n[1] *= (1/static_cast<double> (nPoints));
}
}
void makePlotPairArrays(const std::vector<fftw_complex>& in,
const std::vector<fftw_complex>& out,
doublePairs& inPlot,
doublePairs& outPlot) {
assert(in.size() == out.size());
std::complex<double> i=-1;
i = sqrt(i);
const int halfPoints = in.size() / 2;
for (size_t idx=0; idx<in.size(); idx++) {
double inTemp = std::abs(in[idx][0] + i*in[idx][1]);
/*
Deal with fftshifts needed for visuals:
The method used here may not be the fastest -- Time-domain
approach is often implemented.
*/
double outTemp {0};
const int idx_ = static_cast<int> (idx);
if (idx_ >= halfPoints)
outTemp = std::abs(out[idx_-halfPoints][0] +
i*out[idx_-halfPoints][1]);
else
outTemp = std::abs(out[idx_+halfPoints][0] +
i*out[idx_+halfPoints][1]);
inPlot.push_back(std::make_pair(idx_-halfPoints, inTemp));
outPlot.push_back(std::make_pair(idx_-halfPoints, outTemp));
}
}
int main() {
Gnuplot gp;
constexpr int gridOrder = 9;
constexpr int nPoints = std::pow(2, gridOrder);
constexpr double tau = 18;
std::vector<fftw_complex> in (nPoints);
std::vector<fftw_complex> out (nPoints);
makeGaussian(in, tau);
fftw_plan p;
planForwardFFT(p, nPoints);
performFFT(in, out, p, nPoints);
doublePairs inPlot;
doublePairs outPlot;
makePlotPairArrays (in, out, inPlot, outPlot);
// ... do the GnuPlotting
gp << "set xlabel 'Grid points' font 'Helvetica, 12'\n" <<
"set ylabel 'Intensity' font 'Helvetica, 12'" << std::endl;
gp << "set tics font 'Helvetica, 10'" << std::endl;
gp << "set title 'Gaussian pulse example' font 'Helvetica, 13'" << std::endl;
gp << "set grid xtics lt 0 lw 1 lc 'gray'\n" <<
"set grid ytics lt 0 lw 2 lc 'gray'" << std::endl;
gp << "set term wxt '0'" << std::endl;
gp << "plot '-' lt 1 lw 2 linecolor 'blue' with line title 'Time domain'\n";
gp.send1d(inPlot);
gp << "set term wxt '1'" << std::endl;
gp << "plot '-' lt 11 lw 2 with line title"
"'Spectral domain'\n";
gp.send1d(outPlot);
fftw_destroy_plan(p);
return 0;
}
Compiling:
g++ -std=c++11 -Wall -O3 GaussianPulseDefinition_rev1.cpp -o GaussianPulseDefinition_rev1 -lboost_system -lboost_iostreams -lboost_filesystem -lfftw3 -lm
Saving the plots:
Replace the lines under // ... do the GnuPlotting
with the following (unix only, I think):
gp << "set xlabel 'Grid points' font 'Helvetica, 12'\n" <<
"set ylabel 'Intensity' font 'Helvetica, 12'" << std::endl;
gp << "set tics font 'Helvetica, 10'" << std::endl;
gp << "set title 'Gaussian pulse example' font 'Helvetica, 13'" << std::endl;
gp << "set grid xtics lt 0 lw 1 lc 'gray'\n" <<
"set grid ytics lt 0 lw 2 lc 'gray'" << std::endl;
gp << "set term png" << std::endl;
gp << "set output 'Time.png'" << std::endl;
gp << "plot '-' lt 1 lw 2 linecolor 'blue' with line title 'Time domain'\n";
gp.send1d(inPlot);
gp << "set term x11" << std::endl;
gp << "set term png" << std::endl;
gp << "set output 'Spectrum.png'" << std::endl;
gp << "plot '-' lt 11 lw 2 with line title"
"'Spectral domain'\n";
gp.send1d(outPlot);
gp << "set term x11" << std::endl;
This will save two .png files of the plots (Time.png and Spectrum.png).
-
\$\begingroup\$ Looks great! And do you know how it's doing for speed vs. python? Without getting into benchmarking, a very crude quick test could just be a loop of performing 1000 FFTs in both C++ and Python, + a stopwatch. \$\endgroup\$ternonlerwen– ternonlerwen2017年02月25日 09:37:38 +00:00Commented Feb 25, 2017 at 9:37
-
\$\begingroup\$ Not had chance to benchmark it properly yet, but will put the results here when I finally do. \$\endgroup\$user129626– user1296262017年03月01日 09:12:26 +00:00Commented Mar 1, 2017 at 9:12
-
1\$\begingroup\$ There's a couple of ways to define
i
: the easiest is probablystd::complex<double> i{0, 1};
, but you couldusing namespace std::literals::complex_literals;
and then just write1.0i
where you need it. The reason thatstd::sqrt(-1)
returns a NaN is that the argument isint
, which promotes todouble
- you'd needstd::sqrt(std::complex<double>(-1))
. \$\endgroup\$Toby Speight– Toby Speight2017年07月25日 10:47:57 +00:00Commented Jul 25, 2017 at 10:47