Improving performance of raycaster application

Question 1

I decided to make a raycaster app and try to understand how it really works. So far so good, I have it up and running. If you download the source and execute make, it should pop out a test.exe. I am developing on Ubunutu using Windows Subsystem for Linux 2. I have made changes to the tutorial and I have tried to code it from scratch while following the tutorial's algorithm. Performance is okay, but at 900x900 on my laptop, the frame-rate is about 40fps, which seems low. I cannot figure out where the bottlenecks are at this point. I'd expect to be clearing 100fps easily.

RayCaster1.cpp

#include "include/SDLWrapper.h"
#include <iostream>
#include <vector>
#define WINDOW_WIDTH 900
#define mapWidth 24
#define mapHeight 24
//export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print 2ドル}'):0
//run vcxsrv in windows "XLaunch"
//make sure to check access control checkbox and uncheck the opengl option
//./test.exe
int worldMap[mapWidth][mapHeight]=
{
 {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1},
 {1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
 {1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
 {1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
 {1,0,0,0,0,0,2,2,2,2,2,0,0,0,0,3,0,3,0,3,0,0,0,1},
 {1,0,0,0,0,0,2,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,1},
 {1,0,0,0,0,0,2,0,0,0,2,0,0,0,0,3,0,0,0,3,0,0,0,1},
 {1,0,0,0,0,0,2,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,1},
 {1,0,0,0,0,0,2,2,0,2,2,0,0,0,0,3,0,3,0,3,0,0,0,1},
 {1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
 {1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
 {1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
 {1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
 {1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
 {1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
 {1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
 {1,4,4,4,4,4,4,4,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
 {1,4,0,4,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
 {1,4,0,0,0,0,5,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
 {1,4,0,4,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
 {1,4,0,4,4,4,4,4,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
 {1,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
 {1,4,4,4,4,4,4,4,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
 {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}
};
uint32_t* rasterPixels = new uint32_t[WINDOW_WIDTH*WINDOW_WIDTH]; 
double posX = 22, posY = 12; //x and y start position
double dirX = -1, dirY = 0; //initial direction vector
double planeX = 0, planeY = 0.66; //the 2d raycaster version of camera plane
double moveSpeed; //the constant value is in squares/second
double rotSpeed; //the constant value is in radians/second
uint32_t red = 16711680;
uint32_t dark_red = 8388608;
uint32_t green = 65280;
uint32_t dark_green = 32768;
uint32_t blue = 255;
uint32_t dark_blue = 128;
uint32_t white = 16777215;
uint32_t gray = 8421504;
uint32_t yellow = 16776960;
uint32_t dark_yellow = 8421376;
void initRasterPixels(){
 //Fill the raster with pixel information
 for(int i = 0; i < WINDOW_WIDTH; i++) {
 for(int j = 0; j < WINDOW_WIDTH; j++){
 rasterPixels[i+(WINDOW_WIDTH*j)] = 0;
 }
 }
}
void addVerticalLineOfPixels(int row, uint32_t* rasterPixels, uint32_t color, int start, int end) {
 for(int i = 0; i < WINDOW_WIDTH; i++) {
 if(i >= start && i <= end) {
 rasterPixels[(i*WINDOW_WIDTH)+row] = color;
 } else {
 rasterPixels[(i*WINDOW_WIDTH)+row] = 0;
 }
 }
}
uint32_t* generateRaster(KeyboardState keyboardState, double frameTime){
 if (keyboardState.keyUp)
 {
 if(worldMap[int(posX + dirX * moveSpeed)][int(posY)] == false) posX += dirX * moveSpeed;
 if(worldMap[int(posX)][int(posY + dirY * moveSpeed)] == false) posY += dirY * moveSpeed;
 }
 //move backwards if no wall behind you
 if (keyboardState.keyDown)
 {
 if(worldMap[int(posX - dirX * moveSpeed)][int(posY)] == false) posX -= dirX * moveSpeed;
 if(worldMap[int(posX)][int(posY - dirY * moveSpeed)] == false) posY -= dirY * moveSpeed;
 }
 //rotate to the right
 if (keyboardState.keyRight)
 {
 //both camera direction and camera plane must be rotated
 double oldDirX = dirX;
 dirX = dirX * cos(-rotSpeed) - dirY * sin(-rotSpeed);
 dirY = oldDirX * sin(-rotSpeed) + dirY * cos(-rotSpeed);
 double oldPlaneX = planeX;
 planeX = planeX * cos(-rotSpeed) - planeY * sin(-rotSpeed);
 planeY = oldPlaneX * sin(-rotSpeed) + planeY * cos(-rotSpeed);
 }
 //rotate to the left
 if (keyboardState.keyLeft)
 {
 //both camera direction and camera plane must be rotated
 double oldDirX = dirX;
 dirX = dirX * cos(rotSpeed) - dirY * sin(rotSpeed);
 dirY = oldDirX * sin(rotSpeed) + dirY * cos(rotSpeed);
 double oldPlaneX = planeX;
 planeX = planeX * cos(rotSpeed) - planeY * sin(rotSpeed);
 planeY = oldPlaneX * sin(rotSpeed) + planeY * cos(rotSpeed);
 }
 //main for loop for calculating wall heights
 for(int x = 0; x < WINDOW_WIDTH; x++) {
 //calculate ray position and direction
 double cameraX = 2 * x / double(WINDOW_WIDTH) - 1; //x-coordinate in camera space, normalized, -1 is left, 1 is right, 0 is center
 double rayDirX = dirX + planeX * cameraX;
 double rayDirY = dirY + planeY * cameraX;
 //which box of the map we are in
 int mapX = int(posX);
 int mapY = int(posY);
 //length of ray from current position to next x or y-side
 double sideDistX;
 double sideDistY;
 //length of ray from one x or y-side to next x or y-side
 double deltaDistX = abs(1 / rayDirX);
 double deltaDistY = abs(1 / rayDirY);
 double perpWallDist;
 //what direction to step in x or y-direction (either +1 or -1)
 int stepX;
 int stepY;
 int hit = 0;
 int side; //0 is x-side hit, 1 is y-side hit
 //calculate step and initial sideDist
 if (rayDirX < 0)
 {
 stepX = -1;
 sideDistX = (posX - mapX) * deltaDistX;
 }
 else
 {
 stepX = 1;
 sideDistX = (mapX + 1.0 - posX) * deltaDistX;
 }
 if (rayDirY < 0)
 {
 stepY = -1;
 sideDistY = (posY - mapY) * deltaDistY;
 }
 else
 {
 stepY = 1;
 sideDistY = (mapY + 1.0 - posY) * deltaDistY;
 }
 //perform DDA
 while (hit == 0)
 {
 //jump to next map square, OR in x-direction, OR in y-direction
 if (sideDistX < sideDistY)
 {
 sideDistX += deltaDistX;
 mapX += stepX;
 side = 0;
 }
 else
 {
 sideDistY += deltaDistY;
 mapY += stepY;
 side = 1;
 }
 //Check if ray has hit a wall
 if (worldMap[mapX][mapY] > 0) hit = 1;
 }
 //Calculate distance projected on camera direction (Euclidean distance will give fisheye effect!)
 if (side == 0) perpWallDist = (mapX - posX + (1 - stepX) / 2) / rayDirX;
 else perpWallDist = (mapY - posY + (1 - stepY) / 2) / rayDirY; 
 //Calculate height of line to draw on screen
 int lineHeight = (int)(WINDOW_WIDTH / perpWallDist);
 //calculate lowest and highest pixel to fill in current stripe
 int drawStart = -lineHeight / 2 + WINDOW_WIDTH / 2;
 if(drawStart < 0)drawStart = 0;
 int drawEnd = lineHeight / 2 + WINDOW_WIDTH / 2;
 if(drawEnd >= WINDOW_WIDTH)drawEnd = WINDOW_WIDTH - 1;
 //choose wall color
 uint32_t color;
 //give x and y sides different brightness
 if (side == 1) {
 switch(worldMap[mapX][mapY])
 {
 case 1: color = dark_red; break; //red
 case 2: color = dark_green; break; //green
 case 3: color = dark_blue; break; //blue
 case 4: color = gray; break; //white
 default: color = dark_yellow; break; //yellow
 }
 } else {
 switch(worldMap[mapX][mapY])
 { 
 case 1: color = red; break; //red
 case 2: color = green; break; //green
 case 3: color = blue; break; //blue
 case 4: color = white; break; //white
 default: color = yellow; break; //yellow
 }
 }
 //draw the pixels of the stripe as a vertical line
 uint32_t column[WINDOW_WIDTH];
 addVerticalLineOfPixels(x, rasterPixels, color, drawStart, drawEnd);
 }
 //speed modifiers
 moveSpeed = frameTime * 5.0;
 rotSpeed = frameTime * 3.0;
 return rasterPixels;
}
//Need a separate method here to handle input. The SDLWrapper will send inkeys and keypressed to be operated on here
//and have access to all the file scope vars like pos, wroldmap, dir, plane
int main(int /*argc*/, char */*argv*/[])
{
 SDLWrapper sdlWrapper;
 initRasterPixels();
 sdlWrapper.setupSDLRenderer(WINDOW_WIDTH, generateRaster);
};

SDLWrapper.cpp

#include <iostream>
#include <SDL2/SDL.h>
#include "include/SDLWrapper.h"
#include <SDL2/SDL_ttf.h>
#include <sstream>
#include <time.h>
using namespace std;
#define TICK_INTERVAL 4
static Uint32 next_time;
double currTime = 0; //time of current frame
double oldTime = 0; //time of previous frame
double frameTime; //frameTime is the time this frame has taken, in seconds
TTF_Font* sans_font;
////////////////////////////////////////////////////////////////////////////////
//KEYBOARD FUNCTIONS////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
bool SDLWrapper::keyDown(int key) //this checks if the key is held down, returns true all the time until the key is up
{
 return (inkeys[key] != 0);
}
bool SDLWrapper::keyPressed(int key) //this checks if the key is *just* pressed, returns true only once until the key is up again
{
 if(keypressed.find(key) == keypressed.end()) keypressed[key] = false;
 if(inkeys[key])
 {
 if(keypressed[key] == false)
 {
 keypressed[key] = true;
 return true;
 }
 }
 else keypressed[key] = false;
 return false;
}
///// End of KEYBOARD FUNCTIONS ////
int SDLWrapper::setupSDLRenderer(int WINDOW_WIDTH, uint32_t* (*generateRaster)(KeyboardState, double)){
 SDL_Event event;
 SDL_Renderer *renderer;
 SDL_Window *window;
 bool run = true;
 KeyboardState keyboardState;
 uint32_t color = 0;
 printf("setupSDLRenderer start");
 SDL_Init(SDL_INIT_VIDEO | SDL_INIT_TIMER);
 window = SDL_CreateWindow("test", 0, 0, WINDOW_WIDTH, WINDOW_WIDTH, SDL_WINDOW_OPENGL);
 renderer = SDL_CreateRenderer(window, -1, SDL_RENDERER_ACCELERATED | SDL_RENDERER_PRESENTVSYNC);
 // https://gamedev.stackexchange.com/questions/157604/how-to-get-access-to-framebuffer-as-a-uint32-t-in-sdl2
 SDL_Texture* framebuffer = SDL_CreateTexture(renderer, SDL_PIXELFORMAT_ARGB8888, SDL_TEXTUREACCESS_STREAMING, WINDOW_WIDTH, WINDOW_WIDTH);
 uint32_t* pixels = new uint32_t[WINDOW_WIDTH*WINDOW_WIDTH]; 
 if(TTF_Init()==-1) {
 printf("TTF_Init: %s\n", TTF_GetError());
 exit(2);
 }
 sans_font = TTF_OpenFont("font/Sans.ttf", 12); //this opens a font style and sets a size
 if(!sans_font) {
 printf("TTF_OpenFont: %s\n", TTF_GetError());
 }
 SDL_Rect message_rect; //create a rect
 message_rect.x = 0; //controls the rect's x coordinate 
 message_rect.y = 0; // controls the rect's y coordinte
 message_rect.w = 20; // controls the width of the rect
 message_rect.h = 20; // controls the height of the rect
 SDL_Texture* message;
 next_time = SDL_GetTicks() + TICK_INTERVAL;
 while (run) {
 //Start of future render function
 SDL_RenderClear(renderer);
 rasterPixels = (*generateRaster)(keyboardState, frameTime);
 SDL_UpdateTexture(framebuffer , NULL, rasterPixels, WINDOW_WIDTH * sizeof (uint32_t));
 SDL_RenderCopy(renderer, framebuffer , NULL, NULL);
 // printf("time_left: %d", time_left());
 SDL_Delay(time_left()); //remove this to allow for more than 60 iterations a second
 next_time = SDL_GetTicks() + TICK_INTERVAL;
 //timing for input and FPS counter
 oldTime = currTime;
 currTime = SDL_GetTicks();
 frameTime = (currTime - oldTime) * 0.001; //frameTime is the time this frame has taken, in seconds
 SDL_Color White = {255, 255, 255}; // this is the color in rgb format, maxing out all would give you the color white, and it will be your text's color
 std::stringstream ss;
 ss << (int)(1.0 / frameTime);
 const char* str = ss.str().c_str();
 SDL_Surface* surfaceMessage = TTF_RenderText_Solid(sans_font, str, White); // as TTF_RenderText_Solid could only be used on SDL_Surface then you have to create the surface first
 message = SDL_CreateTextureFromSurface(renderer, surfaceMessage); //now you can convert it into a texture
 //Now since it's a texture, you have to put RenderCopy in your game loop area, the area where the whole code executes
 SDL_RenderCopy(renderer, message, NULL, &message_rect); //you put the renderer's name first, the Message, the crop size(you can ignore this if you don't want to dabble with cropping), and the rect which is the size and coordinate of your texture
 //Don't forget too free your surface and texture
 SDL_RenderPresent(renderer);
 while (SDL_PollEvent(&event)) {
 keyboardState.reset();
 //Start of handleInput function
 if (event.type == SDL_QUIT) {
 run = false;
 }
 inkeys = SDL_GetKeyboardState(NULL);
 if (keyDown(SDL_SCANCODE_UP) || keyDown(SDL_SCANCODE_W)) {
 keyboardState.keyUp = true;
 } else if (keyDown(SDL_SCANCODE_DOWN) || keyDown(SDL_SCANCODE_S)) {
 keyboardState.keyDown = true;
 } else if (keyDown(SDL_SCANCODE_LEFT) || keyDown(SDL_SCANCODE_A)) {
 keyboardState.keyLeft = true;
 } else if (keyDown(SDL_SCANCODE_RIGHT) || keyDown(SDL_SCANCODE_D)) {
 keyboardState.keyRight = true;
 }
 //End of handleInput function
 }
 }
 SDL_DestroyRenderer(renderer);
 SDL_DestroyWindow(window);
 SDL_Quit();
 return EXIT_SUCCESS;
}
uint32_t SDLWrapper::peek(int x, int y, int WINDOW_WIDTH){
 return rasterPixels[x+(WINDOW_WIDTH*y)];
}
int SDLWrapper::time_left(void)
{
 Uint32 now;
 now = SDL_GetTicks();
 //rendered too slow, don't wait
 if(next_time < now) {
 return 0;
 }
 else {
 // rendered too fast, wait until the next tick amount
 return next_time - now;
 }
}
void KeyboardState::reset(){
 keyUp = false;
 keyDown = false;
 keyLeft = false;
 keyRight = false;
 }

Question 2

I suggest figuring out how to use a cpu profiler for your particular platform (Visual Studio has a nice one, but there are simpler ones like "Very Sleepy" to give a quick overview).

These will sample your code as it runs (make sure you run an optimized build, not a debug build), and provide detailed feedback as to which line of code takes up the most time.

I put together a quick test project with your code, and ran it under Very Sleepy, which provides output like the following:

Name Exclusive Inclusive %Exclusive %Inclusive Module Source File Source Line Address
generateRaster 13.94s 13.94s 77.24% 77.24% App D:\Projects\SDLProject\Code\App\main.cpp 332 0x7ff6550b14ea
memcpy 0.98s 0.98s 5.43% 5.43% App d:\A01\_work6円\s\src\vctools\crt\vcruntime\src\string\amd64\memcpy.asm 436 0x7ff65516f120
[00007FF934D6EFD0] 0.01s 0.01s 0.04% 0.04% App 0 0x7ff934d6efd0
D3D_UpdateTextureRep 0.01s 0.99s 0.04% 5.50% App D:\Projects\SDLProject\Code\SDL\src\render\direct3d\SDL_render_d3d.c 760 0x7ff6550c7b6c

So a quick glance shows that more than 75% of the time spent running the program was in the generateRaster function.

It also gives a line by line overview of where the time is spent. (Some lines don't really show up due to optimizations).

profiler hot lines

We can see that addVerticalLineOfPixels() is the culprit.

void addVerticalLineOfPixels(int row, uint32_t* rasterPixels, uint32_t color, int start, int end) {
 for(int i = 0; i < WINDOW_WIDTH; i++) {
 if(i >= start && i <= end) {
 rasterPixels[(i*WINDOW_WIDTH)+row] = color;
 } else {
 rasterPixels[(i*WINDOW_WIDTH)+row] = 0;
 }
 }
}

This isn't very surprising, since the function effectively touches every single pixel on our screen texture.

We might try to reduce the number of pixels we have to touch by clearing the texture some other way (e.g. blitting to it on the GPU instead). This means we'd only have to manually set the pixels between start and end to color.

However, even if we did that, there's another problem which is described at the very bottom of the linked tutorial:

Raycasting works with vertical stripes, but the screen buffer in memory is laid out with horizontal scanlines. So drawing vertical stripes is bad for memory locality for caching (it is in fact a worst case scenario), and the loss of good caching may hurt the speed more than some of the 3D computations on modern machines. It may be possible to program this with better caching behavior (e.g. processing multiple stripes at once, using a cache-oblivious transpose algorithm, or having a 90 degree rotated raycaster), but for simplicity the rest of this tutorial ignores this caching issue.

In other words, using rasterPixels[(i*WINDOW_WIDTH)+row] = ... skips across a large amount of memory, and only sets a single pixel in every WINDOW_WIDTH. We repeat this skipping for every column in the texture.

Instead, we would like to do rasterPixels[row * WINDOW_WIDTH + i] = ..., so we set every pixel in a contiguous block.

It's actually a simple change:

void addVerticalLineOfPixels(int row, uint32_t* rP, uint32_t color, int start, int end) {
 std::fill_n(rP + (row * WINDOW_WIDTH), start, 0);
 std::fill_n(rP + (row * WINDOW_WIDTH) + start, (end - start), color);
 std::fill_n(rP + (row * WINDOW_WIDTH) + end, (WINDOW_WIDTH - end), 0);
}

(We use std::fill_n for neatness and performance - we're effectively doing 3 loops over the index ranges [0, start), [start, end) and [end, WINDOW_WIDTH) ).

Then we can replace the texture blitting function SDL_RenderCopy with SDL_RenderCopyEx, which lets us rotate the texture while blitting:

SDL_RenderCopyEx(renderer, framebuffer, NULL, NULL, -90.0, NULL, SDL_RendererFlip::SDL_FLIP_NONE);

Measuring these changes with the profiler shows that generateRaster now takes up <2.5% of the program run time.

:)

Question 3

I figured out how to cross-compile to windows using mingw compiler. When I run this app in Windows natively, it runs much faster, well over 200fps on my laptop at 900x900 instead of 40-60fps. I believe if I was running this natively on linux and not with wsl2 I would have never had performance concerns.

user673679 user673679 12.2k2 gold badges34 silver badges65 bronze badges · Answer 1 · 2020-11-08 13:35:58Z

I suggest figuring out how to use a cpu profiler for your particular platform (Visual Studio has a nice one, but there are simpler ones like "Very Sleepy" to give a quick overview).

These will sample your code as it runs (make sure you run an optimized build, not a debug build), and provide detailed feedback as to which line of code takes up the most time.

I put together a quick test project with your code, and ran it under Very Sleepy, which provides output like the following:

Name Exclusive Inclusive %Exclusive %Inclusive Module Source File Source Line Address
generateRaster 13.94s 13.94s 77.24% 77.24% App D:\Projects\SDLProject\Code\App\main.cpp 332 0x7ff6550b14ea
memcpy 0.98s 0.98s 5.43% 5.43% App d:\A01\_work6円\s\src\vctools\crt\vcruntime\src\string\amd64\memcpy.asm 436 0x7ff65516f120
[00007FF934D6EFD0] 0.01s 0.01s 0.04% 0.04% App 0 0x7ff934d6efd0
D3D_UpdateTextureRep 0.01s 0.99s 0.04% 5.50% App D:\Projects\SDLProject\Code\SDL\src\render\direct3d\SDL_render_d3d.c 760 0x7ff6550c7b6c

So a quick glance shows that more than 75% of the time spent running the program was in the generateRaster function.

It also gives a line by line overview of where the time is spent. (Some lines don't really show up due to optimizations).

profiler hot lines

We can see that addVerticalLineOfPixels() is the culprit.

void addVerticalLineOfPixels(int row, uint32_t* rasterPixels, uint32_t color, int start, int end) {
 for(int i = 0; i < WINDOW_WIDTH; i++) {
 if(i >= start && i <= end) {
 rasterPixels[(i*WINDOW_WIDTH)+row] = color;
 } else {
 rasterPixels[(i*WINDOW_WIDTH)+row] = 0;
 }
 }
}

This isn't very surprising, since the function effectively touches every single pixel on our screen texture.

We might try to reduce the number of pixels we have to touch by clearing the texture some other way (e.g. blitting to it on the GPU instead). This means we'd only have to manually set the pixels between start and end to color.

However, even if we did that, there's another problem which is described at the very bottom of the linked tutorial:

Raycasting works with vertical stripes, but the screen buffer in memory is laid out with horizontal scanlines. So drawing vertical stripes is bad for memory locality for caching (it is in fact a worst case scenario), and the loss of good caching may hurt the speed more than some of the 3D computations on modern machines. It may be possible to program this with better caching behavior (e.g. processing multiple stripes at once, using a cache-oblivious transpose algorithm, or having a 90 degree rotated raycaster), but for simplicity the rest of this tutorial ignores this caching issue.

In other words, using rasterPixels[(i*WINDOW_WIDTH)+row] = ... skips across a large amount of memory, and only sets a single pixel in every WINDOW_WIDTH. We repeat this skipping for every column in the texture.

Instead, we would like to do rasterPixels[row * WINDOW_WIDTH + i] = ..., so we set every pixel in a contiguous block.

It's actually a simple change:

void addVerticalLineOfPixels(int row, uint32_t* rP, uint32_t color, int start, int end) {
 std::fill_n(rP + (row * WINDOW_WIDTH), start, 0);
 std::fill_n(rP + (row * WINDOW_WIDTH) + start, (end - start), color);
 std::fill_n(rP + (row * WINDOW_WIDTH) + end, (WINDOW_WIDTH - end), 0);
}

(We use std::fill_n for neatness and performance - we're effectively doing 3 loops over the index ranges [0, start), [start, end) and [end, WINDOW_WIDTH) ).

Then we can replace the texture blitting function SDL_RenderCopy with SDL_RenderCopyEx, which lets us rotate the texture while blitting:

SDL_RenderCopyEx(renderer, framebuffer, NULL, NULL, -90.0, NULL, SDL_RendererFlip::SDL_FLIP_NONE);

Measuring these changes with the profiler shows that generateRaster now takes up <2.5% of the program run time.

:)

smuggledPancakes smuggledPancakes 1112 bronze badges · Answer 2 · 2020-06-10 17:44:40Z

I figured out how to cross-compile to windows using mingw compiler. When I run this app in Windows natively, it runs much faster, well over 200fps on my laptop at 900x900 instead of 40-60fps. I believe if I was running this natively on linux and not with wsl2 I would have never had performance concerns.

Stack Exchange Network

Improving performance of raycaster application

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Improving performance of raycaster application

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions