C++ Fast Fourier transform

Question 1

This is a very simple FFT, I am wondering what I can do to make this faster and more memory efficient from the programming side (better data types, and maybe some tricks like unrolling loops or using the preprocessor if that is useful here), and not by using a more efficient mathematical algorithm. Obviously I would appreciate advice on best practices as well.

#include <stdio.h>
#include <vector>
#include <iostream>
#include <complex>
#include <cmath>
#include <algorithm>
#define N 1048576
#define PI 3.14159265358979323846
/*
Creating the table of all N'th roots of unity.
We use notation omega_k = e^(-2 pi i / n).
*/
template<typename U>
std::vector< std::complex<U> > rootsOfUnityCalculator() {
 std::vector< std::complex<U> > table;
 for (size_t k = 0; k < N; k++) {
 std::complex<U> kthRootOfUnity(std::cos(-2.0 * PI * k / N), std::sin(-2.0 * PI * k / N));
 table.push_back(kthRootOfUnity);
 }
 return table;
}
/*
Fast Fourier transform, T is the precision level, so float or double.
table is a look up table of the roots of unity. Overwrites the input.
For now only works for N a power of 2.
*/
template<typename T>
void FFT(std::complex<T>* input, const std::vector< std::complex<T> >& table, size_t n) {
 if (n % 2 == 0) {
 // Split up the input in even and odd components
 std::complex<T>* evenComponents = new std::complex<T>[n/2];
 std::complex<T>* oddComponents = new std::complex<T>[n/2];
 for (size_t k = 0; k < n/2; k++) {
 evenComponents[k] = input[2 * k];
 oddComponents[k] = input[2 * k + 1];
 }
 // Transform the even and odd input
 FFT(evenComponents, table, n/2);
 FFT(oddComponents, table, n/2);
 // Use the algorithm from Danielson and Lanczos
 for (size_t k = 0; k < n/2; k++) {
 std::complex<T> plusMinus = table[N / n * k] * oddComponents[k]; // omega_n^k = (omega_N^(N/n))^k = omega_N^(Nk/n)
 input[k] = evenComponents[k] + plusMinus;
 input[k + n/2] = evenComponents[k] - plusMinus;
 }
 delete[] evenComponents;
 delete[] oddComponents;
 } else {
 // The Fourier transform on one element does not do anything, so
 // nothing needed here.
 }
}
int main() {
 std::complex<double>* input = new std::complex<double>[N];
 for (size_t k = 0; k < N; k++) {
 input[k] = k;
 }
 const std::vector< std::complex<double> > table = rootsOfUnityCalculator<double>();
 // Overwrites the input with its Fourier transform
 FFT<double>(input, table, N);
 delete[] input;
 return 0;
}

Question 2

You might also like the jjj.de/fxt/#fxtbook

Question 3

"What I can do to make this faster and more memory efficient from the programming side" -- Truly fast FFTs are several orders of magnitude faster than naive ones, and the techniques to get that performance are a better fit for a textbook than a StackExchange answer. Look at the Halide FFT or FFTW to see what goes into this..

Question 4

Avoid excessive memory usage

At each step of the recursion, you allocate two arrays that together are as large as the input. That means that your algorithm uses \$\mathcal{O}(N \log N)\$ space. It should be possible to rewrite your code so you only use \$\mathcal{O}(N)\$ space without substantially changing the algorithm. But this also brings me to:

In-place vs. out-of-place algorithms

While the function signature makes it looks like you are doing an in-place FFT transform, you are actually doing an out-of-place transform (using extra memory), and then copying the result back into the input array. However, there are also real in-place FFT algorithms that, apart from a few variables, do not need any extra memory, and by their nature overwrite the input array. This means they use only \$\mathcal{O}(1)\$ memory. The drawback of the in-place algorithms is that the transformed data is not in the order you expect; you might have to use a bit-reversed index to access the data. However, for some applications this doesn't matter at all, for example if you perform convolutions using FFT.

If you do use an out-of-place algorithm then I suggest you make this explicit, and either return a std::vector with the results, or add a parameter to the function that tells it where to store the results.

Question 5

It's ironic that the in-place algorithms may leave the output in an order you do not expect -- i.e. not in it's place !

Question 6

Prefer C++ headers (such as <cstdio>) to the C compatibility headers (<stdio.h>). The C++ headers define identifiers in the std namespace where we want them.

It seems that this header isn't even used here, so we can omit it completely.

std::size_t is misspellt throughout as size_t. This commonly happens when writing on a platform that declares it in the global namespace as well as in std, but that's not a portable assumption.

We only use PI as a double, so we can declare it as a constant rather than as a macro:

constexpr double pi = 3.14159265358979323846;

Or to make it more obviously correct:

const double pi = std::acos(-1.0);

We can avoid naked new[] and delete[] using std::vector. If we use std::vector::reserve before adding any elements, there should be no performance impact, only improved code safety.

Using a vector saves having to pass the array size explicitly, too.

A helpful technique to determine whether a number is a power of two is that bitwise-and of the number and its arithmetic predecessor is zero for powers of two:

constexpr bool is_power_of_two(unsigned n)
{
 return n && (n & (n-1) == 0);
}

A single value of N is probably not very useful in general. We could make it a template parameter, or a simple function argument, to make our code useful for other sizes of input.

Modified version

I split FFT() into a simple public function that ensures arguments are valid and the internal recursive implementation. I've used a requires to constrain the complex type we accept (integer types aren't useful here), but that can be omitted for plain C++17.

#include <complex>
#include <vector>
namespace internal
{
 /*
 Creating the table of all N'th roots of unity.
 We use notation ω_i = e^(-2πi/n).
 */
 template<typename U>
 constexpr auto rootsOfUnity(std::size_t n)
 {
 constexpr auto pi = std::acos(U{-1.0});
 std::vector<std::complex<U>> table;
 table.reserve(n);
 for (std::size_t i = 0; i < n; ++i) {
 table.emplace_back(std::cos(-2.0 * pi * i / n), std::sin(-2.0 * pi * i / n));
 }
 return table;
 }
 template<typename V> // V is vector of complex
 void FFT(V& input, const V& table, std::size_t total_size)
 {
 if (input.size() <= 1) {
 // The Fourier transform on one element does not do
 // anything, so nothing needed here.
 return;
 }
 const auto n2 = input.size() / 2;
 // Split up the input in even and odd components
 V evenComponents; evenComponents.reserve(n2);
 V oddComponents; oddComponents.reserve(n2);
 for (std::size_t i = 0; i < input.size(); ) {
 evenComponents.push_back(input[i++]);
 oddComponents.push_back(input[i++]);
 }
 // Transform the even and odd input
 FFT(evenComponents, table, total_size);
 FFT(oddComponents, table, total_size);
 // Use the algorithm from Danielson and Lanczos
 for (std::size_t i = 0; i < n2; ++i) {
 // ω_n^i = (ω_N^(N/n))^i = ω_N^(Nk/n)
 auto const plusMinus = table[total_size / n2 * i] * oddComponents[i];
 input[i] = evenComponents[i] + plusMinus;
 input[i + n2] = evenComponents[i] - plusMinus;
 }
 }
}
/*
 Fast Fourier transform
 T is the precision level: float, double, long double.
 Overwrites the input.
 Only works for input sizes that are a power of 2.
*/
template<typename T>
 requires std::is_floating_point<T>
void FFT(std::vector<std::complex<T>>& input)
{
 if (input.size() == 0) {
 throw std::invalid_argument("input is empty");
 }
 if (input.size() & (input.size() - 1)) {
 throw std::invalid_argument("input length is not a power of two");
 }
 auto const table = internal::rootsOfUnity<T>(input.size());
 internal::FFT(input, table, input.size());
}

// Test program
#include <iostream>
int main()
{
 static constexpr size_t n = 1048576;
 std::vector<std::complex<double>> input;
 for (std::size_t i = 0; i < n; ++i) {
 input.emplace_back(i);
 }
 // Overwrite input with its Fourier transform
 FFT<double>(input);
 // use the result, to avoid optimising out
 std::cout << input[0] << '\n';
}

Question 7

Could you add some speed comparisons?

Question 8

my preference for pi has been to initialise it using std::acos(-1.0) since -1.0 is exactly representable and obviously correct compared to a long constant that could have a mistake hiding at a digit I don't have memorised

Question 9

@kelalaka, the speed is identical to the original, to within 1 standard deviation of my measurements. I was focusing on robustness; see other answer for performance improvements.

Question 10

@TobySpeight Is there a reason for throw **new** <exception>? Would it not simplify exception handling to omit the new?

Question 11

C++20 finally has std::numbers::pi so we don't need to define it ourselves.

Question 12

Use unique_ptr instead of new[]/delete[]

std::complex<double>* input = new std::complex<double>[N];
...
delete[] input;

This line can be replaced by appropriate use of std::unique_ptr:

auto input_storage = std::make_unique<std::complex<double>[]>(N); // #include <memory>
std::complex<double>* input = input_storage.get();

The memory is automatically reclaimed once input_storage goes out of scope.

I originally suggested unique_ptr here because it's the moral equivalent of your code (has identical run-time behavior), but there are other options that might be more suitable for your situation:

std::vector<T> (referenced in another answer):
- More flexible than unique_ptr<T[]> since you can resize and push_back.
- Keeps track of its own length and capacity, unlike unique_ptr<T[]> where you need to store the length as a separate variable. Hence, less error-prone, but more memory footprint (not likely to matter unless you have lots of vectors).
std::array<T, N>: This is similar to a C-style array T[N], but offers an API more consistent with other C++ containers. It shares some of the same limitations as C-style arrays though:
- A crucial limitation is that the length has to be known at compile-time (constexpr or static const), whereas both unique_ptr<T[]> and vector accept lengths at run time.
- A large array<T, N> allocated on the stack can overflow the stack, so you generally want to keep N rather small, or use a smart pointer to std::array.

G. Sliepen G. Sliepen 68.9k3 gold badges74 silver badges179 bronze badges · Accepted Answer · 2021-03-01 13:53:25Z

Avoid excessive memory usage

At each step of the recursion, you allocate two arrays that together are as large as the input. That means that your algorithm uses \$\mathcal{O}(N \log N)\$ space. It should be possible to rewrite your code so you only use \$\mathcal{O}(N)\$ space without substantially changing the algorithm. But this also brings me to:

In-place vs. out-of-place algorithms

While the function signature makes it looks like you are doing an in-place FFT transform, you are actually doing an out-of-place transform (using extra memory), and then copying the result back into the input array. However, there are also real in-place FFT algorithms that, apart from a few variables, do not need any extra memory, and by their nature overwrite the input array. This means they use only \$\mathcal{O}(1)\$ memory. The drawback of the in-place algorithms is that the transformed data is not in the order you expect; you might have to use a bit-reversed index to access the data. However, for some applications this doesn't matter at all, for example if you perform convolutions using FFT.

If you do use an out-of-place algorithm then I suggest you make this explicit, and either return a std::vector with the results, or add a parameter to the function that tells it where to store the results.

It's ironic that the in-place algorithms may leave the output in an order you do not expect -- i.e. not in it's place !

Stack Exchange Network

C++ Fast Fourier transform

3 Answers 3

Avoid excessive memory usage

In-place vs. out-of-place algorithms

Modified version

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

C++ Fast Fourier transform

3 Answers 3

Avoid excessive memory usage

In-place vs. out-of-place algorithms

Modified version

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions