Mandelbrot image generator and viewer

Question 1

This is a C++ program which outputs a Mandelbrot fractal image in a graphics window and lets the user zoom and pan around the image, generating a new image each time the user does so. I'm using SFML for the graphics part.

Because it generates a new image after every zoom or pan, it is pretty slow, especially with a 1000x600 image (it can take up to half a second for the new image to load).

#include <SFML/Graphics.hpp>
#include <cmath>
const int IMAGE_WIDTH = 1000;
const int IMAGE_HEIGHT = 600;
double zoom = 0.004; // allow the user to zoom in and out...
double offsetX = -0.7; // and move around
double offsetY = 0.0;
const int MAX = 127; // maximum number of iterations for mandelbrot()
 // don't increase MAX or the colouring will look strange
int mandelbrot(double, double, int);
sf::Color getColor(int);
int main() {
 sf::RenderWindow window(sf::VideoMode(IMAGE_WIDTH, IMAGE_HEIGHT), "Mandelbrot");
 window.setFramerateLimit(30);
 sf::Image image;
 image.create(IMAGE_WIDTH, IMAGE_HEIGHT, sf::Color(0, 0, 0));
 sf::Texture texture;
 sf::Sprite sprite;
 bool stateChanged = true; // track whether the image needs to be regenerated
 while (window.isOpen()) {
 sf::Event event;
 while (window.pollEvent(event)) {
 switch (event.type) {
 case sf::Event::Closed:
 window.close();
 break;
 case sf::Event::KeyPressed:
 stateChanged = true; // image needs to be recreated when the user changes zoom or offset
 switch (event.key.code) {
 case sf::Keyboard::Escape:
 window.close();
 break;
 case sf::Keyboard::Equal:
 zoom *= 0.9;
 break;
 case sf::Keyboard::Dash:
 zoom /= 0.9;
 break;
 case sf::Keyboard::W:
 offsetY -= 40 * zoom;
 break;
 case sf::Keyboard::S:
 offsetY += 40 * zoom;
 break;
 case sf::Keyboard::A:
 offsetX -= 40 * zoom;
 break;
 case sf::Keyboard::D:
 offsetX += 40 * zoom;
 break;
 default: break;
 }
 default:
 break;
 }
 }
 if (stateChanged) { // only generate a new image if something has changed, to avoid unnecessary lag
 for (int x = 0; x < IMAGE_WIDTH; x++) {
 for (int y = 0; y < IMAGE_HEIGHT; y++) {
 // convert x and y to the appropriate complex number
 double real = (x - IMAGE_WIDTH / 2.0) * zoom + offsetX;
 double imag = (y - IMAGE_HEIGHT / 2.0) * zoom + offsetY;
 int value = mandelbrot(real, imag, MAX);
 image.setPixel(x, y, getColor(value));
 }
 }
 texture.loadFromImage(image);
 sprite.setTexture(texture);
 }
 window.clear();
 window.draw(sprite);
 window.display();
 stateChanged = false;
 }
 return 0;
}
int mandelbrot(double startReal, double startImag, int maximum) {
 int counter = 0;
 double zReal = startReal;
 double zImag = startImag;
 double nextRe;
 while (pow(zReal, 2.0) + pow(zImag, 2.0) <= 4.0 && counter <= maximum) {
 nextRe = pow(zReal, 2.0) - pow(zImag, 2.0) + startReal;
 zImag = 2.0 * zReal * zImag + startImag;
 zReal = nextRe;
 if (zReal == startReal && zImag == startImag) { // a repetition indicates that the point is in the Mandelbrot set
 return -1; // points in the Mandelbrot set are represented by a return value of -1
 }
 counter += 1;
 }
 if (counter >= maximum) {
 return -1; // -1 is used here to indicate that the point lies within the Mandelbrot set
 } else {
 return counter; // returning the number of iterations allows for colouring
 }
}
sf::Color getColor(int iterations) {
 int r, g, b;
 if (iterations == -1) {
 r = 0;
 g = 0;
 b = 0;
 } else if (iterations == 0) {
 r = 255;
 g = 0;
 b = 0;
 } else {
 // colour gradient: Red -> Blue -> Green -> Red -> Black
 // corresponding values: 0 -> 16 -> 32 -> 64 -> 127 (or -1)
 if (iterations < 16) {
 r = 16 * (16 - iterations);
 g = 0;
 b = 16 * iterations - 1;
 } else if (iterations < 32) {
 r = 0;
 g = 16 * (iterations - 16);
 b = 16 * (32 - iterations) - 1;
 } else if (iterations < 64) {
 r = 8 * (iterations - 32);
 g = 8 * (64 - iterations) - 1;
 b = 0;
 } else { // range is 64 - 127
 r = 255 - (iterations - 64) * 4;
 g = 0;
 b = 0;
 }
 }
 return sf::Color(r, g, b);
}

The window immediately after it is opened:

The window on opening

The window after zooming in (using the + key) and panning upwards (using the W key):

The window after zooming in and panning up

Question 2

What are your review goals? Speed? Readability?

Question 3

@Nobody I'm more familiar with Java and Python, so I'm interested in making my code more consistent with C++ style. Readability is also concern for me. Speed is less of a priority for me, as I think I have a rough idea of where to improve.

Question 4

You call window.clear(), window.draw() and window.display() after executing window.close(), do you really want to do that?

Question 5

Half a second o_O thinks wistfully back in the times it took a full 5 minutes. How much we have progressed..

Question 6

Also instead of pixel-wise manipulation and calling a function, is there some option to save an image buffer and then replace the whole image?

Question 7

I see some things that may help you improve your code.

Move loop invariants outside the loop

To maximize performance, moving loop invariants outside the loop often helps. This is an optimization that many compilers can make on their own, but it helps to write it explicitly. In particular, the nested loop within main where the image is recalculated. Right now the loop looks like this:

for (int x = 0; x < IMAGE_WIDTH; x++) {
 for (int y = 0; y < IMAGE_HEIGHT; y++) {
 // convert x and y to the appropriate complex number
 double real = (x - IMAGE_WIDTH / 2.0) * zoom + offsetX;
 double imag = (y - IMAGE_HEIGHT / 2.0) * zoom + offsetY;
 int value = mandelbrot(real, imag, MAX);
 image.setPixel(x, y, getColor(value));
 }
}

However, with a little bit of algebra, we see that the real and imag calculations could be rewritten as

double real = x * zoom - IMAGE_WIDTH / 2.0 * zoom + offsetX;
double imag = y * zoom - IMAGE_HEIGHT / 2.0 * zoom + offsetY;

Since only x and y are actually changing within the loop, it's actually simple to replace all of that with this:

double real = 0 * zoom - IMAGE_WIDTH / 2.0 * zoom + offsetX;
double imagstart = 0 * zoom - IMAGE_HEIGHT / 2.0 * zoom + offsetY;
for (int x = 0; x < IMAGE_WIDTH; x++, real += zoom) {
 double imag = imagstart;
 for (int y = 0; y < IMAGE_HEIGHT; y++, imag += zoom) {
 // convert x and y to the appropriate complex number
 int value = mandelbrot(real, imag, MAX);
 image.setPixel(x, y, getColor(value));
 }
}

Avoid special cases

The value of -1 is used to represent points within the Mandelbrot set, but then this value is explicitly checked when assigning a color and is equivalent to MAX. Why not simply assign the value of MAX and avoid the special case?

Prefer simple lookup to code

The getColor() function gets a single number in the range of 0 to MAX (if the previous suggestion is taken) and returns a color. Why not replace this with a table lookup? Compute the table values once (either at compile-time or run-time) and then use the table. At the beginning of main you could have this:

std::array<sf::Color, MAX+1> colors;
for (int i=0; i <= MAX; ++i) {
 colors[i] = getColor(i);
}

Then the loop would use this:

image.setPixel(x, y, colors[value]);

Don't use `==` for floating point numbers

Using == with floating point numbers is generally not a good idea because if they differ only by some small amount (say 1e-99), they are not equal. In fact, if you modify the code to count the number of times that statement evaluates to true, I think you will find that it never does and is therefore simply wasting time.

For more depth about floating point issues, I'd recommend the excellent article "What every computer scientist should know about floating-point arithmetic" by David Goldberg.

Optimize loops for early bailout

The mandelbrot loop can be rewritten more simply like this:

int mandelbrot(double startReal, double startImag) {
 double zReal = startReal;
 double zImag = startImag;
 for (int counter = 0; counter < MAX; ++counter) {
 double r2 = zReal * zReal;
 double i2 = zImag * zImag;
 if (r2 + i2 > 4.0) {
 return counter;
 }
 zImag = 2.0 * zReal * zImag + startImag;
 zReal = r2 - i2 + startReal;
 }
 return MAX;
}

Several changes were made here. First, since the original third parameter maximum was always passed in with the value of MAX, it was simply eliminated from the parameter list and the value MAX substituted directly.

Next, the loop is now a for loop instead of a while loop which makes the structure more obvious.

Next, the r2 and i2 values are precomputed, making the code a little shorter and simpler (and probably more efficient).

Lastly, the early bailout is done within the loop. Only if we get to the end of the loop does the value MAX get returned.

Omit work not needed

The window.clear() call near the end of main is not needed since the following line writes the entire window anyway. That line can simply be eliminated.

Eliminate global variables

All but MAX can be made local to within main.

Eliminate framerate limit

Unless there is a reason to do so, you might find that eliminating the call to window.setFramerateLimit or setting the value to 0 allows the program to be more responsive. After I applied all of the suggestions mentioned above, I found that the framerate limit was the thing preventing further speed increases on my machine.

Don't update the screen if not needed

Right now any key (even those that have no effect, cause the screen to recalculate. This is easily fixed by adding stateChanged = false; to the default case in the key event handling switch statement.

Use C++11 threads

One can fairly painlessly parallelize this code. I have changed the end of main so that it now looks like this:

 if (stateChanged) { 
 updateImage(zoom, offsetX, offsetY, image);
 texture.loadFromImage(image);
 sprite.setTexture(texture);
 stateChanged = false;
 }
 window.draw(sprite);
 window.display();
 }
}

As you can see, the nested loops have been replaced by a call to updateImage. It now looks like this:

void updateImage(double zoom, double offsetX, double offsetY, sf::Image& image) 
{
 std::vector<std::thread> threads;
 for (int i = 0; i < IMAGE_HEIGHT; i += 100) {
 threads.push_back(std::thread(updateImageSlice, zoom, offsetX, offsetY, 
 std::ref(image), i, i+100));
 }
 for (auto &t : threads) {
 t.join();
 }
}

The updateImageSlice does exactly what it claims to do and looks like this:

void updateImageSlice(double zoom, double offsetX, double offsetY, sf::Image& image, int minY, int maxY) 
{
 double real = 0 * zoom - IMAGE_WIDTH / 2.0 * zoom + offsetX;
 double imagstart = minY * zoom - IMAGE_HEIGHT / 2.0 * zoom + offsetY;
 for (int x = 0; x < IMAGE_WIDTH; x++, real += zoom) {
 double imag = imagstart;
 for (int y = minY; y < maxY; y++, imag += zoom) {
 int value = mandelbrot(real, imag);
 image.setPixel(x, y, colors[value]);
 }
 }
}

With this, things are noticably faster and look good until I get to a zoom factor of around 2e-16 or smaller, where there is noticable banding due to the previously mentioned floating point issues. A more careful reworking of the calculations could eliminate or defer that, but I didn't take the time to do so.

Also I should point out that sharing a single reference to the image variable is not generally the way to handle shared threads because it leads to at least the potential of a data race and corruption. However, as I guessed, the structure of the image object is such that the threads do not seem to contend for the same memory locations.

Revised version

Per request in the comments, here's the entire thing. This actually goes a few steps farther in that it turns most of the program into a Mandelbrot object and also scales the number of threads per std::thread::hardware_concurrency. It requires a C++11 compliant compiler.

#include <SFML/Graphics.hpp>
#include <array>
#include <vector>
#include <thread>
static constexpr int IMAGE_WIDTH = 1000;
static constexpr int IMAGE_HEIGHT = 600;
class Mandelbrot {
public:
 Mandelbrot();
 void updateImage(double zoom, double offsetX, double offsetY, sf::Image& image) const; 
private:
 static const int MAX = 127; // maximum number of iterations for mandelbrot()
 // don't increase MAX or the colouring will look strange
 std::array<sf::Color, MAX+1> colors;
 int mandelbrot(double startReal, double startImag) const;
 sf::Color getColor(int iterations) const;
 void updateImageSlice(double zoom, double offsetX, double offsetY, sf::Image& image, int minY, int maxY) const;
};
Mandelbrot::Mandelbrot() {
 for (int i=0; i <= MAX; ++i) {
 colors[i] = getColor(i);
 }
}
int Mandelbrot::mandelbrot(double startReal, double startImag) const {
 double zReal = startReal;
 double zImag = startImag;
 for (int counter = 0; counter < MAX; ++counter) {
 double r2 = zReal * zReal;
 double i2 = zImag * zImag;
 if (r2 + i2 > 4.0) {
 return counter;
 }
 zImag = 2.0 * zReal * zImag + startImag;
 zReal = r2 - i2 + startReal;
 }
 return MAX;
}
sf::Color Mandelbrot::getColor(int iterations) const {
 int r, g, b;
 // colour gradient: Red -> Blue -> Green -> Red -> Black
 // corresponding values: 0 -> 16 -> 32 -> 64 -> 127 (or -1)
 if (iterations < 16) {
 r = 16 * (16 - iterations);
 g = 0;
 b = 16 * iterations - 1;
 } else if (iterations < 32) {
 r = 0;
 g = 16 * (iterations - 16);
 b = 16 * (32 - iterations) - 1;
 } else if (iterations < 64) {
 r = 8 * (iterations - 32);
 g = 8 * (64 - iterations) - 1;
 b = 0;
 } else { // range is 64 - 127
 r = 255 - (iterations - 64) * 4;
 g = 0;
 b = 0;
 }
 return sf::Color(r, g, b);
}
void Mandelbrot::updateImageSlice(double zoom, double offsetX, double offsetY, sf::Image& image, int minY, int maxY) const
{
 double real = 0 * zoom - IMAGE_WIDTH / 2.0 * zoom + offsetX;
 double imagstart = minY * zoom - IMAGE_HEIGHT / 2.0 * zoom + offsetY;
 for (int x = 0; x < IMAGE_WIDTH; x++, real += zoom) {
 double imag = imagstart;
 for (int y = minY; y < maxY; y++, imag += zoom) {
 int value = mandelbrot(real, imag);
 image.setPixel(x, y, colors[value]);
 }
 }
}
void Mandelbrot::updateImage(double zoom, double offsetX, double offsetY, sf::Image& image) const
{
 const int STEP = IMAGE_HEIGHT/std::thread::hardware_concurrency();
 std::vector<std::thread> threads;
 for (int i = 0; i < IMAGE_HEIGHT; i += STEP) {
 threads.push_back(std::thread(&Mandelbrot::updateImageSlice, *this, zoom, offsetX, offsetY, std::ref(image), i, std::min(i+STEP, IMAGE_HEIGHT)));
 }
 for (auto &t : threads) {
 t.join();
 }
}
int main() {
 double offsetX = -0.7; // and move around
 double offsetY = 0.0;
 double zoom = 0.004; // allow the user to zoom in and out...
 Mandelbrot mb;
 sf::RenderWindow window(sf::VideoMode(IMAGE_WIDTH, IMAGE_HEIGHT), "Mandelbrot");
 window.setFramerateLimit(0);
 sf::Image image;
 image.create(IMAGE_WIDTH, IMAGE_HEIGHT, sf::Color(0, 0, 0));
 sf::Texture texture;
 sf::Sprite sprite;
 bool stateChanged = true; // track whether the image needs to be regenerated
 while (window.isOpen()) {
 sf::Event event;
 while (window.pollEvent(event)) {
 switch (event.type) {
 case sf::Event::Closed:
 window.close();
 break;
 case sf::Event::KeyPressed:
 stateChanged = true; // image needs to be recreated when the user changes zoom or offset
 switch (event.key.code) {
 case sf::Keyboard::Escape:
 window.close();
 break;
 case sf::Keyboard::Equal:
 zoom *= 0.9;
 break;
 case sf::Keyboard::Dash:
 zoom /= 0.9;
 break;
 case sf::Keyboard::W:
 offsetY -= 40 * zoom;
 break;
 case sf::Keyboard::S:
 offsetY += 40 * zoom;
 break;
 case sf::Keyboard::A:
 offsetX -= 40 * zoom;
 break;
 case sf::Keyboard::D:
 offsetX += 40 * zoom;
 break;
 default: 
 stateChanged = false;
 break;
 }
 default:
 break;
 }
 }
 if (stateChanged) { 
 mb.updateImage(zoom, offsetX, offsetY, image);
 texture.loadFromImage(image);
 sprite.setTexture(texture);
 stateChanged = false;
 }
 window.draw(sprite);
 window.display();
 }
}

Question 8

Did you measure that multithreaded implementation? The overhead of starting one thread per column of the image might be too fine granular. Furthermore, it is very likely that starting IMAGE_WIDTH threads is too much for usual CPUs which will result in unecessary many context switches.

Question 9

Yes I measured it and it's faster. I think you've misread the code. It's not one thread per column of the image which would be 1000 threads, but rather one thread per 100 horizontal lines which is 6 threads for this code.

Question 10

Yes, I missed the += 100. Yet that is a magic number and does not scale automatically with different image sizes/number of cores.

Question 11

Could you give a full listing of your final code? I would like to run measurements on it and I have difficulties recreating it.

Question 12

@Nobody: Sure! I look forward to your timing results. I posted the full code to my answer.

Question 13

Namings

Unlike in Python, all-caps names are usually left to macros (which you should avoid, by the way) in C++; for example: M_PI, CHAR_BIT and so on.

Moden Language Features

There are several places where you can take advantage of modern C++ features:

constexpr: pure constants and magic numbers should be declared as constexpr.
```
static constexpr int image_width = 1000;
static constexpr int image_height = 600;
static constexpr int max_iterations = 127;
```
Constants could also be collected in a namespace (e.g. namespace constants) or under enums.
Uniform initialization: you could use it to simplify some parts by not writing explicit types. In getColor, prefer:
```
sf::Color getColor(int iterations) 
{
 int r, g, b;
 // ...
 return {r, g, b};
}
```
The same holds for sf::RenderWindow/sf::Color:
```
sf::RenderWindow window { {IMAGE_WIDTH, IMAGE_HEIGHT}, "Mandelbrot"};
```

Performance

A minor optimization for getColor would try to reduce conditional assignments by giving r, g, b default values and so changing them only if necessary:

sf::Color getColor(int iterations) {
 int r = g = b = 0;
 if (iterations == -1) {
 // nothing to do
 } else if (iterations == 0) {
 r = 255;
 // g = 0;
 // b = 0;
 } else {
 // colour gradient: Red -> Blue -> Green -> Red -> Black
 // corresponding values: 0 -> 16 -> 32 -> 64 -> 127 (or -1)
 if (iterations < 16) {
 r = 16 * (16 - iterations);
 // g = 0;
 b = 16 * iterations - 1;
 } else if (iterations < 32) {
 // r = 0;
 g = 16 * (iterations - 16);
 b = 16 * (32 - iterations) - 1;
 } else if (iterations < 64) {
 r = 8 * (iterations - 32);
 g = 8 * (64 - iterations) - 1;
 // b = 0;
 } else { // range is 64 - 127
 r = 255 - (iterations - 64) * 4;
 // g = 0;
 // b = 0;
 }
 }

Other Details

Globals should generally be avoided;
The maximum mandelbrot could default to MAX i.e.
int mandelbrot(double startReal, double startImag, int maximum = ::MAX)
zoom should have a maximum level and therefore you should ensure it's not broken (double-s are signed).

Question 14

Could you elaborate on constexpr? Why is it preferred over const?

Question 15

@jacwah const tells the compiler such value won't change throughout program's life; it doesn't however say when such constant will be assigned. It's perfectly fine if it's done at run-time. constexpr does not allow this: values have to strictly be constant expressions and the compiler shall ensure that. For more info, see here.

Question 16

@jacwah This "lack" can also be misleading. See here for an example.

Question 17

@jacwah You'd therefore prefer it over const if you're sure about the data you have. It's also required in some contexts (e.g. templates and meta-programming in general). It might also help produce faster code.

Question 18

I'm going to comment mostly on performance aspects of your code here. Stylistic parts of your code should be improved upon by someone more conversant in C++ than I am.

Magic Numbers

You have factored out some constants, but when you calculate zooming, 0.9 and 40 are "magic numbers", numbers with no explanation as to what they do, and those whose usage entails updating values at multiple locations when you change them. Maybe you would like to make them constants and call them ZOOM_DOWN_FACTOR and ZOOM_UP_FACTOR respectively.

Performance

I realize compiling with -O3 and -ffastmath might make your compiler turn those pow() calls into plain multiplications, which are faster, so compile for release with those options. To compensate for floating-point inaccuracies, you should factor in a small Epsilon (maximum allowed deviation) value of about 10^-15. Try changing your bailout condition to <= 4.0 + 10^-15.
One pointer for faster panning: factor out your image generation code into a different method from main(), which accepts parameters defining the range of pixels of the image which should be subjected to Mandelbrot calculation. Call this function on every pan event with the range of pixels which have been invalidated due to the pan, get this image with some blank pixels and merge it with the old image to get the panned image. Should make panning orders of magnitude faster.

Miscellaneous

counter >= maximum is unnecessary. An == will do just fine here.
Suggestion: Use return instead of break after executing window.close() in the relevant case statements. Should avoid a potential memory leak. (Thanks @pacmaninbw for bringing this to my notice).

Minor useful tips from a (reasonably) experienced fractal programmer:

_{(me, of course. Look here, especially as you are familiar with Java)}

Implement a suitable form of color smoothing. Linear interpolation should help. Its a bit bit expensive on performance, but it vastly improves image quality. Exponential smoothing should help provide a suitable interpolation value.

This is the formula: expiter += exp(-cabs(z)-0.5/(cabs(zold - z))).

Here, exp(x) is the standard exponential function e^x, your math library has it. zold and z are complex numbers, zold is the value of z from the previous iteration. cabs(z) returns the square of the modulus of z (basically the sum of the squares of the real and imaginary components of z). expiter is the fractional or smooth iteration count.

For each RGB component, apply linear interpolation as follows:

value = toValue * expiter + fromValue * ( 1 - expiter). toValue and fromValue are basically the values of the said RGB component for iterations and iterations + 1

Question 19

+1 for -O3. I remember when -O3 outdid my hand optimizations on a raytracer by an order of magnitude when I started learning C++ and tried it for the first time.

Question 20

Thank you, this is all very helpful! Colour smoothing sounds good, so I'll look into it soon.

Question 21

Use library functionality

Mandelbrot images are calculated with complex numbers. C++ offers the <complex> header which contains a template for generating complex numbers out of floats or other numeric types.

Initialize at definition

Many compilers can give you warnings about code paths with uninitialized variables but in general it might be good to zero initialize your r, g, and b values to zero at the beginning of the getColor function. This also allows to remove all later zero assignments in the ifs.

Cascading `if`s

Often it is a bad idea to cascade ifs in the way you have done. Since your code uses equal spacing in multiples of 16 you could instead use a switch:

if(iterations < 0) {
} else {
 switch(iterations / 16) {
 case 0: // deal with 0-15
 break;
 case 1: // deal with 16-31
 break;
 case 2: // deal with 32-47
 /* no break! */
 case 3: // deal with 48-63
 break;
 default: // deal with iterations >= 64
 }
}

Parallelize

The results of different pixels are independent of each other. This screams for parallelization. It might suffice to use OpenMP's parallel for.

Misc

use ++counter instead of counter += 1; for incrementing (likely to be optimized by compiler)

Question 22

Thanks for all your help. In particular, using a switch in the way you suggested makes the getColorfunction quite a lot neater!

Question 23

@monopole: I would consider this a hack instead of idiomatic and it was intended as a speed optimization due to reduced branching.

Question 24

Don't need to use OpenMP any more . C++ comes with its own parallel library. Just use async()

Question 25

You should probably use pow(x,2) (rather than 2.0) see if the compiler optimizes that.

Question 26

@LokiAstari: While I agree that there are possibilities in standard C++ that allow for parallelization, an OpenMP parallel for would be far easier to write in this case.

Question 27

I have to start with saying that this is a really cool and well made project! If you would like to improve the performance of your code, I recommend offloading work to GPU shaders. mandelbrot easily fits into the description of a fragment shader. You can use OpenGL through SFML to achieve this, see this and this.

Question 28

Naming, minor but useful

int mandelbrot - this function returns number of iterations, name mandelbrot does not tell anything about its purpose.

int counter = 0; - this variable holds the number of iteration. Name numIterations would be better because it explains much more

Question 29

I see you already have got many answers about good code practices like naming conventions and other good things. Here's some optimization help for computational algorithms for modern CPUs in general.

Replace conditionals (if, while, switch-case) inside loops with arithmetics whenever it is possible. This is important to not risk destroy instruction pipelining in modern processors.

Examples in code: if's and whiles inside mandelbrot and getColor functions.

Optimization settings for your compiler. Allow it to optimize, try and unroll loops if it can, help it pick good assembler instructions by letting it know what CPU you have. Allow it to inline function calls if the function calls are in loops. Maybe use fast instructions and lower precision (for example float instead of double) if reduced precision is OK. I suspect you can use fast instructions without any problems on this particular program.

Examples in code: allowing the compiler to do inlining can avoid branching on the calls to the mandelbrot and getColor functions (which are inside the loops).

Memory planning and structuring. Plan the data structures and calculations to utilize the RAM caches as much as you can. Difference between missing and hitting caches can easily affect performance tens to hundreds of times.

Edward Edward 67.2k4 gold badges120 silver badges284 bronze badges · Answer 1 · 2016-03-31 15:26:11Z

I see some things that may help you improve your code.

Move loop invariants outside the loop

To maximize performance, moving loop invariants outside the loop often helps. This is an optimization that many compilers can make on their own, but it helps to write it explicitly. In particular, the nested loop within main where the image is recalculated. Right now the loop looks like this:

for (int x = 0; x < IMAGE_WIDTH; x++) {
 for (int y = 0; y < IMAGE_HEIGHT; y++) {
 // convert x and y to the appropriate complex number
 double real = (x - IMAGE_WIDTH / 2.0) * zoom + offsetX;
 double imag = (y - IMAGE_HEIGHT / 2.0) * zoom + offsetY;
 int value = mandelbrot(real, imag, MAX);
 image.setPixel(x, y, getColor(value));
 }
}

However, with a little bit of algebra, we see that the real and imag calculations could be rewritten as

double real = x * zoom - IMAGE_WIDTH / 2.0 * zoom + offsetX;
double imag = y * zoom - IMAGE_HEIGHT / 2.0 * zoom + offsetY;

Since only x and y are actually changing within the loop, it's actually simple to replace all of that with this:

double real = 0 * zoom - IMAGE_WIDTH / 2.0 * zoom + offsetX;
double imagstart = 0 * zoom - IMAGE_HEIGHT / 2.0 * zoom + offsetY;
for (int x = 0; x < IMAGE_WIDTH; x++, real += zoom) {
 double imag = imagstart;
 for (int y = 0; y < IMAGE_HEIGHT; y++, imag += zoom) {
 // convert x and y to the appropriate complex number
 int value = mandelbrot(real, imag, MAX);
 image.setPixel(x, y, getColor(value));
 }
}

Avoid special cases

The value of -1 is used to represent points within the Mandelbrot set, but then this value is explicitly checked when assigning a color and is equivalent to MAX. Why not simply assign the value of MAX and avoid the special case?

Prefer simple lookup to code

The getColor() function gets a single number in the range of 0 to MAX (if the previous suggestion is taken) and returns a color. Why not replace this with a table lookup? Compute the table values once (either at compile-time or run-time) and then use the table. At the beginning of main you could have this:

std::array<sf::Color, MAX+1> colors;
for (int i=0; i <= MAX; ++i) {
 colors[i] = getColor(i);
}

Then the loop would use this:

image.setPixel(x, y, colors[value]);

Don't use `==` for floating point numbers

Using == with floating point numbers is generally not a good idea because if they differ only by some small amount (say 1e-99), they are not equal. In fact, if you modify the code to count the number of times that statement evaluates to true, I think you will find that it never does and is therefore simply wasting time.

For more depth about floating point issues, I'd recommend the excellent article "What every computer scientist should know about floating-point arithmetic" by David Goldberg.

Optimize loops for early bailout

The mandelbrot loop can be rewritten more simply like this:

int mandelbrot(double startReal, double startImag) {
 double zReal = startReal;
 double zImag = startImag;
 for (int counter = 0; counter < MAX; ++counter) {
 double r2 = zReal * zReal;
 double i2 = zImag * zImag;
 if (r2 + i2 > 4.0) {
 return counter;
 }
 zImag = 2.0 * zReal * zImag + startImag;
 zReal = r2 - i2 + startReal;
 }
 return MAX;
}

Several changes were made here. First, since the original third parameter maximum was always passed in with the value of MAX, it was simply eliminated from the parameter list and the value MAX substituted directly.

Next, the loop is now a for loop instead of a while loop which makes the structure more obvious.

Next, the r2 and i2 values are precomputed, making the code a little shorter and simpler (and probably more efficient).

Lastly, the early bailout is done within the loop. Only if we get to the end of the loop does the value MAX get returned.

Omit work not needed

The window.clear() call near the end of main is not needed since the following line writes the entire window anyway. That line can simply be eliminated.

Eliminate global variables

All but MAX can be made local to within main.

Eliminate framerate limit

Unless there is a reason to do so, you might find that eliminating the call to window.setFramerateLimit or setting the value to 0 allows the program to be more responsive. After I applied all of the suggestions mentioned above, I found that the framerate limit was the thing preventing further speed increases on my machine.

Don't update the screen if not needed

Right now any key (even those that have no effect, cause the screen to recalculate. This is easily fixed by adding stateChanged = false; to the default case in the key event handling switch statement.

Use C++11 threads

One can fairly painlessly parallelize this code. I have changed the end of main so that it now looks like this:

 if (stateChanged) { 
 updateImage(zoom, offsetX, offsetY, image);
 texture.loadFromImage(image);
 sprite.setTexture(texture);
 stateChanged = false;
 }
 window.draw(sprite);
 window.display();
 }
}

As you can see, the nested loops have been replaced by a call to updateImage. It now looks like this:

void updateImage(double zoom, double offsetX, double offsetY, sf::Image& image) 
{
 std::vector<std::thread> threads;
 for (int i = 0; i < IMAGE_HEIGHT; i += 100) {
 threads.push_back(std::thread(updateImageSlice, zoom, offsetX, offsetY, 
 std::ref(image), i, i+100));
 }
 for (auto &t : threads) {
 t.join();
 }
}

The updateImageSlice does exactly what it claims to do and looks like this:

void updateImageSlice(double zoom, double offsetX, double offsetY, sf::Image& image, int minY, int maxY) 
{
 double real = 0 * zoom - IMAGE_WIDTH / 2.0 * zoom + offsetX;
 double imagstart = minY * zoom - IMAGE_HEIGHT / 2.0 * zoom + offsetY;
 for (int x = 0; x < IMAGE_WIDTH; x++, real += zoom) {
 double imag = imagstart;
 for (int y = minY; y < maxY; y++, imag += zoom) {
 int value = mandelbrot(real, imag);
 image.setPixel(x, y, colors[value]);
 }
 }
}

With this, things are noticably faster and look good until I get to a zoom factor of around 2e-16 or smaller, where there is noticable banding due to the previously mentioned floating point issues. A more careful reworking of the calculations could eliminate or defer that, but I didn't take the time to do so.

Also I should point out that sharing a single reference to the image variable is not generally the way to handle shared threads because it leads to at least the potential of a data race and corruption. However, as I guessed, the structure of the image object is such that the threads do not seem to contend for the same memory locations.

Revised version

Per request in the comments, here's the entire thing. This actually goes a few steps farther in that it turns most of the program into a Mandelbrot object and also scales the number of threads per std::thread::hardware_concurrency. It requires a C++11 compliant compiler.

#include <SFML/Graphics.hpp>
#include <array>
#include <vector>
#include <thread>
static constexpr int IMAGE_WIDTH = 1000;
static constexpr int IMAGE_HEIGHT = 600;
class Mandelbrot {
public:
 Mandelbrot();
 void updateImage(double zoom, double offsetX, double offsetY, sf::Image& image) const; 
private:
 static const int MAX = 127; // maximum number of iterations for mandelbrot()
 // don't increase MAX or the colouring will look strange
 std::array<sf::Color, MAX+1> colors;
 int mandelbrot(double startReal, double startImag) const;
 sf::Color getColor(int iterations) const;
 void updateImageSlice(double zoom, double offsetX, double offsetY, sf::Image& image, int minY, int maxY) const;
};
Mandelbrot::Mandelbrot() {
 for (int i=0; i <= MAX; ++i) {
 colors[i] = getColor(i);
 }
}
int Mandelbrot::mandelbrot(double startReal, double startImag) const {
 double zReal = startReal;
 double zImag = startImag;
 for (int counter = 0; counter < MAX; ++counter) {
 double r2 = zReal * zReal;
 double i2 = zImag * zImag;
 if (r2 + i2 > 4.0) {
 return counter;
 }
 zImag = 2.0 * zReal * zImag + startImag;
 zReal = r2 - i2 + startReal;
 }
 return MAX;
}
sf::Color Mandelbrot::getColor(int iterations) const {
 int r, g, b;
 // colour gradient: Red -> Blue -> Green -> Red -> Black
 // corresponding values: 0 -> 16 -> 32 -> 64 -> 127 (or -1)
 if (iterations < 16) {
 r = 16 * (16 - iterations);
 g = 0;
 b = 16 * iterations - 1;
 } else if (iterations < 32) {
 r = 0;
 g = 16 * (iterations - 16);
 b = 16 * (32 - iterations) - 1;
 } else if (iterations < 64) {
 r = 8 * (iterations - 32);
 g = 8 * (64 - iterations) - 1;
 b = 0;
 } else { // range is 64 - 127
 r = 255 - (iterations - 64) * 4;
 g = 0;
 b = 0;
 }
 return sf::Color(r, g, b);
}
void Mandelbrot::updateImageSlice(double zoom, double offsetX, double offsetY, sf::Image& image, int minY, int maxY) const
{
 double real = 0 * zoom - IMAGE_WIDTH / 2.0 * zoom + offsetX;
 double imagstart = minY * zoom - IMAGE_HEIGHT / 2.0 * zoom + offsetY;
 for (int x = 0; x < IMAGE_WIDTH; x++, real += zoom) {
 double imag = imagstart;
 for (int y = minY; y < maxY; y++, imag += zoom) {
 int value = mandelbrot(real, imag);
 image.setPixel(x, y, colors[value]);
 }
 }
}
void Mandelbrot::updateImage(double zoom, double offsetX, double offsetY, sf::Image& image) const
{
 const int STEP = IMAGE_HEIGHT/std::thread::hardware_concurrency();
 std::vector<std::thread> threads;
 for (int i = 0; i < IMAGE_HEIGHT; i += STEP) {
 threads.push_back(std::thread(&Mandelbrot::updateImageSlice, *this, zoom, offsetX, offsetY, std::ref(image), i, std::min(i+STEP, IMAGE_HEIGHT)));
 }
 for (auto &t : threads) {
 t.join();
 }
}
int main() {
 double offsetX = -0.7; // and move around
 double offsetY = 0.0;
 double zoom = 0.004; // allow the user to zoom in and out...
 Mandelbrot mb;
 sf::RenderWindow window(sf::VideoMode(IMAGE_WIDTH, IMAGE_HEIGHT), "Mandelbrot");
 window.setFramerateLimit(0);
 sf::Image image;
 image.create(IMAGE_WIDTH, IMAGE_HEIGHT, sf::Color(0, 0, 0));
 sf::Texture texture;
 sf::Sprite sprite;
 bool stateChanged = true; // track whether the image needs to be regenerated
 while (window.isOpen()) {
 sf::Event event;
 while (window.pollEvent(event)) {
 switch (event.type) {
 case sf::Event::Closed:
 window.close();
 break;
 case sf::Event::KeyPressed:
 stateChanged = true; // image needs to be recreated when the user changes zoom or offset
 switch (event.key.code) {
 case sf::Keyboard::Escape:
 window.close();
 break;
 case sf::Keyboard::Equal:
 zoom *= 0.9;
 break;
 case sf::Keyboard::Dash:
 zoom /= 0.9;
 break;
 case sf::Keyboard::W:
 offsetY -= 40 * zoom;
 break;
 case sf::Keyboard::S:
 offsetY += 40 * zoom;
 break;
 case sf::Keyboard::A:
 offsetX -= 40 * zoom;
 break;
 case sf::Keyboard::D:
 offsetX += 40 * zoom;
 break;
 default: 
 stateChanged = false;
 break;
 }
 default:
 break;
 }
 }
 if (stateChanged) { 
 mb.updateImage(zoom, offsetX, offsetY, image);
 texture.loadFromImage(image);
 sprite.setTexture(texture);
 stateChanged = false;
 }
 window.draw(sprite);
 window.display();
 }
}

Did you measure that multithreaded implementation? The overhead of starting one thread per column of the image might be too fine granular. Furthermore, it is very likely that starting IMAGE_WIDTH threads is too much for usual CPUs which will result in unecessary many context switches.
Yes I measured it and it's faster. I think you've misread the code. It's not one thread per column of the image which would be 1000 threads, but rather one thread per 100 horizontal lines which is 6 threads for this code.
Yes, I missed the += 100. Yet that is a magic number and does not scale automatically with different image sizes/number of cores.
Could you give a full listing of your final code? I would like to run measurements on it and I have difficulties recreating it.
@Nobody: Sure! I look forward to your timing results. I posted the full code to my answer.

edmz edmz 1,0437 silver badges16 bronze badges · Answer 2 · 2016-03-31 13:58:12Z

Namings

Unlike in Python, all-caps names are usually left to macros (which you should avoid, by the way) in C++; for example: M_PI, CHAR_BIT and so on.

Moden Language Features

There are several places where you can take advantage of modern C++ features:

constexpr: pure constants and magic numbers should be declared as constexpr.
```
static constexpr int image_width = 1000;
static constexpr int image_height = 600;
static constexpr int max_iterations = 127;
```
Constants could also be collected in a namespace (e.g. namespace constants) or under enums.
Uniform initialization: you could use it to simplify some parts by not writing explicit types. In getColor, prefer:
```
sf::Color getColor(int iterations) 
{
 int r, g, b;
 // ...
 return {r, g, b};
}
```
The same holds for sf::RenderWindow/sf::Color:
```
sf::RenderWindow window { {IMAGE_WIDTH, IMAGE_HEIGHT}, "Mandelbrot"};
```

Performance

A minor optimization for getColor would try to reduce conditional assignments by giving r, g, b default values and so changing them only if necessary:

sf::Color getColor(int iterations) {
 int r = g = b = 0;
 if (iterations == -1) {
 // nothing to do
 } else if (iterations == 0) {
 r = 255;
 // g = 0;
 // b = 0;
 } else {
 // colour gradient: Red -> Blue -> Green -> Red -> Black
 // corresponding values: 0 -> 16 -> 32 -> 64 -> 127 (or -1)
 if (iterations < 16) {
 r = 16 * (16 - iterations);
 // g = 0;
 b = 16 * iterations - 1;
 } else if (iterations < 32) {
 // r = 0;
 g = 16 * (iterations - 16);
 b = 16 * (32 - iterations) - 1;
 } else if (iterations < 64) {
 r = 8 * (iterations - 32);
 g = 8 * (64 - iterations) - 1;
 // b = 0;
 } else { // range is 64 - 127
 r = 255 - (iterations - 64) * 4;
 // g = 0;
 // b = 0;
 }
 }

Other Details

Globals should generally be avoided;
The maximum mandelbrot could default to MAX i.e.
int mandelbrot(double startReal, double startImag, int maximum = ::MAX)
zoom should have a maximum level and therefore you should ensure it's not broken (double-s are signed).

Could you elaborate on constexpr? Why is it preferred over const?
@jacwah const tells the compiler such value won't change throughout program's life; it doesn't however say when such constant will be assigned. It's perfectly fine if it's done at run-time. constexpr does not allow this: values have to strictly be constant expressions and the compiler shall ensure that. For more info, see here.
@jacwah This "lack" can also be misleading. See here for an example.
@jacwah You'd therefore prefer it over const if you're sure about the data you have. It's also required in some contexts (e.g. templates and meta-programming in general). It might also help produce faster code.

Tamoghna Chowdhury Tamoghna Chowdhury 2,33110 silver badges22 bronze badges · Answer 3 · 2016-03-31 12:56:43Z

I'm going to comment mostly on performance aspects of your code here. Stylistic parts of your code should be improved upon by someone more conversant in C++ than I am.

Magic Numbers

You have factored out some constants, but when you calculate zooming, 0.9 and 40 are "magic numbers", numbers with no explanation as to what they do, and those whose usage entails updating values at multiple locations when you change them. Maybe you would like to make them constants and call them ZOOM_DOWN_FACTOR and ZOOM_UP_FACTOR respectively.

Performance

I realize compiling with -O3 and -ffastmath might make your compiler turn those pow() calls into plain multiplications, which are faster, so compile for release with those options. To compensate for floating-point inaccuracies, you should factor in a small Epsilon (maximum allowed deviation) value of about 10^-15. Try changing your bailout condition to <= 4.0 + 10^-15.
One pointer for faster panning: factor out your image generation code into a different method from main(), which accepts parameters defining the range of pixels of the image which should be subjected to Mandelbrot calculation. Call this function on every pan event with the range of pixels which have been invalidated due to the pan, get this image with some blank pixels and merge it with the old image to get the panned image. Should make panning orders of magnitude faster.

Miscellaneous

counter >= maximum is unnecessary. An == will do just fine here.
Suggestion: Use return instead of break after executing window.close() in the relevant case statements. Should avoid a potential memory leak. (Thanks @pacmaninbw for bringing this to my notice).

Minor useful tips from a (reasonably) experienced fractal programmer:

_{(me, of course. Look here, especially as you are familiar with Java)}

Implement a suitable form of color smoothing. Linear interpolation should help. Its a bit bit expensive on performance, but it vastly improves image quality. Exponential smoothing should help provide a suitable interpolation value.

This is the formula: expiter += exp(-cabs(z)-0.5/(cabs(zold - z))).

Here, exp(x) is the standard exponential function e^x, your math library has it. zold and z are complex numbers, zold is the value of z from the previous iteration. cabs(z) returns the square of the modulus of z (basically the sum of the squares of the real and imaginary components of z). expiter is the fractional or smooth iteration count.

For each RGB component, apply linear interpolation as follows:

value = toValue * expiter + fromValue * ( 1 - expiter). toValue and fromValue are basically the values of the said RGB component for iterations and iterations + 1

+1 for -O3. I remember when -O3 outdid my hand optimizations on a raytracer by an order of magnitude when I started learning C++ and tried it for the first time.
Thank you, this is all very helpful! Colour smoothing sounds good, so I'll look into it soon.

score 10 · Answer 4 · 2016-03-31 12:59:28Z

Use library functionality

Mandelbrot images are calculated with complex numbers. C++ offers the <complex> header which contains a template for generating complex numbers out of floats or other numeric types.

Initialize at definition

Many compilers can give you warnings about code paths with uninitialized variables but in general it might be good to zero initialize your r, g, and b values to zero at the beginning of the getColor function. This also allows to remove all later zero assignments in the ifs.

Cascading `if`s

Often it is a bad idea to cascade ifs in the way you have done. Since your code uses equal spacing in multiples of 16 you could instead use a switch:

if(iterations < 0) {
} else {
 switch(iterations / 16) {
 case 0: // deal with 0-15
 break;
 case 1: // deal with 16-31
 break;
 case 2: // deal with 32-47
 /* no break! */
 case 3: // deal with 48-63
 break;
 default: // deal with iterations >= 64
 }
}

Parallelize

The results of different pixels are independent of each other. This screams for parallelization. It might suffice to use OpenMP's parallel for.

Misc

use ++counter instead of counter += 1; for incrementing (likely to be optimized by compiler)

Thanks for all your help. In particular, using a switch in the way you suggested makes the getColorfunction quite a lot neater!
@monopole: I would consider this a hack instead of idiomatic and it was intended as a speed optimization due to reduced branching.
Don't need to use OpenMP any more . C++ comes with its own parallel library. Just use async()
You should probably use pow(x,2) (rather than 2.0) see if the compiler optimizes that.
@LokiAstari: While I agree that there are possibilities in standard C++ that allow for parallelization, an OpenMP parallel for would be far easier to write in this case.

jacwah jacwah 2,69118 silver badges42 bronze badges · Answer 5 · 2016-03-31 13:05:09Z

I have to start with saying that this is a really cool and well made project! If you would like to improve the performance of your code, I recommend offloading work to GPU shaders. mandelbrot easily fits into the description of a fragment shader. You can use OpenGL through SFML to achieve this, see this and this.

score 3 · Answer 6 · 2016-04-01 08:14:41Z

Naming, minor but useful

int mandelbrot - this function returns number of iterations, name mandelbrot does not tell anything about its purpose.

int counter = 0; - this variable holds the number of iteration. Name numIterations would be better because it explains much more

mathreadler mathreadler 2362 silver badges9 bronze badges · Answer 7 · 2016-04-02 16:32:18Z

I see you already have got many answers about good code practices like naming conventions and other good things. Here's some optimization help for computational algorithms for modern CPUs in general.

Replace conditionals (if, while, switch-case) inside loops with arithmetics whenever it is possible. This is important to not risk destroy instruction pipelining in modern processors.

Examples in code: if's and whiles inside mandelbrot and getColor functions.

Optimization settings for your compiler. Allow it to optimize, try and unroll loops if it can, help it pick good assembler instructions by letting it know what CPU you have. Allow it to inline function calls if the function calls are in loops. Maybe use fast instructions and lower precision (for example float instead of double) if reduced precision is OK. I suspect you can use fast instructions without any problems on this particular program.

Examples in code: allowing the compiler to do inlining can avoid branching on the calls to the mandelbrot and getColor functions (which are inside the loops).

Memory planning and structuring. Plan the data structures and calculations to utilize the RAM caches as much as you can. Difference between missing and hitting caches can easily affect performance tens to hundreds of times.

Stack Exchange Network

Mandelbrot image generator and viewer

7 Answers 7

Move loop invariants outside the loop

Avoid special cases

Prefer simple lookup to code

Don't use `==` for floating point numbers

Optimize loops for early bailout

Omit work not needed

Eliminate global variables

Eliminate framerate limit

Don't update the screen if not needed

Use C++11 threads

Revised version

Namings

Moden Language Features

Performance

Other Details

Magic Numbers

Performance

Miscellaneous

Minor useful tips from a (reasonably) experienced fractal programmer:

Use library functionality

Initialize at definition

Cascading `if`s

Parallelize

Misc

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Mandelbrot image generator and viewer

7 Answers 7

Move loop invariants outside the loop

Avoid special cases

Prefer simple lookup to code

Don't use == for floating point numbers

Optimize loops for early bailout

Omit work not needed

Eliminate global variables

Eliminate framerate limit

Don't update the screen if not needed

Use C++11 threads

Revised version

Namings

Moden Language Features

Performance

Other Details

Magic Numbers

Performance

Miscellaneous

Minor useful tips from a (reasonably) experienced fractal programmer:

Use library functionality

Initialize at definition

Cascading ifs

Parallelize

Misc

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions

Don't use `==` for floating point numbers

Cascading `if`s