Tutorial I have been following
I decided to make a raycaster app and try to understand how it really works. So far so good, I have it up and running. If you download the source and execute make, it should pop out a test.exe. I am developing on Ubunutu using Windows Subsystem for Linux 2. I have made changes to the tutorial and I have tried to code it from scratch while following the tutorial's algorithm. Performance is okay, but at 900x900 on my laptop, the frame-rate is about 40fps, which seems low. I cannot figure out where the bottlenecks are at this point. I'd expect to be clearing 100fps easily.
RayCaster1.cpp
#include "include/SDLWrapper.h"
#include <iostream>
#include <vector>
#define WINDOW_WIDTH 900
#define mapWidth 24
#define mapHeight 24
//export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print 2ドル}'):0
//run vcxsrv in windows "XLaunch"
//make sure to check access control checkbox and uncheck the opengl option
//./test.exe
int worldMap[mapWidth][mapHeight]=
{
{1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1},
{1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
{1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
{1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
{1,0,0,0,0,0,2,2,2,2,2,0,0,0,0,3,0,3,0,3,0,0,0,1},
{1,0,0,0,0,0,2,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,1},
{1,0,0,0,0,0,2,0,0,0,2,0,0,0,0,3,0,0,0,3,0,0,0,1},
{1,0,0,0,0,0,2,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,1},
{1,0,0,0,0,0,2,2,0,2,2,0,0,0,0,3,0,3,0,3,0,0,0,1},
{1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
{1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
{1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
{1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
{1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
{1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
{1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
{1,4,4,4,4,4,4,4,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
{1,4,0,4,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
{1,4,0,0,0,0,5,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
{1,4,0,4,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
{1,4,0,4,4,4,4,4,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
{1,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
{1,4,4,4,4,4,4,4,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1},
{1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}
};
uint32_t* rasterPixels = new uint32_t[WINDOW_WIDTH*WINDOW_WIDTH];
double posX = 22, posY = 12; //x and y start position
double dirX = -1, dirY = 0; //initial direction vector
double planeX = 0, planeY = 0.66; //the 2d raycaster version of camera plane
double moveSpeed; //the constant value is in squares/second
double rotSpeed; //the constant value is in radians/second
uint32_t red = 16711680;
uint32_t dark_red = 8388608;
uint32_t green = 65280;
uint32_t dark_green = 32768;
uint32_t blue = 255;
uint32_t dark_blue = 128;
uint32_t white = 16777215;
uint32_t gray = 8421504;
uint32_t yellow = 16776960;
uint32_t dark_yellow = 8421376;
void initRasterPixels(){
//Fill the raster with pixel information
for(int i = 0; i < WINDOW_WIDTH; i++) {
for(int j = 0; j < WINDOW_WIDTH; j++){
rasterPixels[i+(WINDOW_WIDTH*j)] = 0;
}
}
}
void addVerticalLineOfPixels(int row, uint32_t* rasterPixels, uint32_t color, int start, int end) {
for(int i = 0; i < WINDOW_WIDTH; i++) {
if(i >= start && i <= end) {
rasterPixels[(i*WINDOW_WIDTH)+row] = color;
} else {
rasterPixels[(i*WINDOW_WIDTH)+row] = 0;
}
}
}
uint32_t* generateRaster(KeyboardState keyboardState, double frameTime){
if (keyboardState.keyUp)
{
if(worldMap[int(posX + dirX * moveSpeed)][int(posY)] == false) posX += dirX * moveSpeed;
if(worldMap[int(posX)][int(posY + dirY * moveSpeed)] == false) posY += dirY * moveSpeed;
}
//move backwards if no wall behind you
if (keyboardState.keyDown)
{
if(worldMap[int(posX - dirX * moveSpeed)][int(posY)] == false) posX -= dirX * moveSpeed;
if(worldMap[int(posX)][int(posY - dirY * moveSpeed)] == false) posY -= dirY * moveSpeed;
}
//rotate to the right
if (keyboardState.keyRight)
{
//both camera direction and camera plane must be rotated
double oldDirX = dirX;
dirX = dirX * cos(-rotSpeed) - dirY * sin(-rotSpeed);
dirY = oldDirX * sin(-rotSpeed) + dirY * cos(-rotSpeed);
double oldPlaneX = planeX;
planeX = planeX * cos(-rotSpeed) - planeY * sin(-rotSpeed);
planeY = oldPlaneX * sin(-rotSpeed) + planeY * cos(-rotSpeed);
}
//rotate to the left
if (keyboardState.keyLeft)
{
//both camera direction and camera plane must be rotated
double oldDirX = dirX;
dirX = dirX * cos(rotSpeed) - dirY * sin(rotSpeed);
dirY = oldDirX * sin(rotSpeed) + dirY * cos(rotSpeed);
double oldPlaneX = planeX;
planeX = planeX * cos(rotSpeed) - planeY * sin(rotSpeed);
planeY = oldPlaneX * sin(rotSpeed) + planeY * cos(rotSpeed);
}
//main for loop for calculating wall heights
for(int x = 0; x < WINDOW_WIDTH; x++) {
//calculate ray position and direction
double cameraX = 2 * x / double(WINDOW_WIDTH) - 1; //x-coordinate in camera space, normalized, -1 is left, 1 is right, 0 is center
double rayDirX = dirX + planeX * cameraX;
double rayDirY = dirY + planeY * cameraX;
//which box of the map we are in
int mapX = int(posX);
int mapY = int(posY);
//length of ray from current position to next x or y-side
double sideDistX;
double sideDistY;
//length of ray from one x or y-side to next x or y-side
double deltaDistX = abs(1 / rayDirX);
double deltaDistY = abs(1 / rayDirY);
double perpWallDist;
//what direction to step in x or y-direction (either +1 or -1)
int stepX;
int stepY;
int hit = 0;
int side; //0 is x-side hit, 1 is y-side hit
//calculate step and initial sideDist
if (rayDirX < 0)
{
stepX = -1;
sideDistX = (posX - mapX) * deltaDistX;
}
else
{
stepX = 1;
sideDistX = (mapX + 1.0 - posX) * deltaDistX;
}
if (rayDirY < 0)
{
stepY = -1;
sideDistY = (posY - mapY) * deltaDistY;
}
else
{
stepY = 1;
sideDistY = (mapY + 1.0 - posY) * deltaDistY;
}
//perform DDA
while (hit == 0)
{
//jump to next map square, OR in x-direction, OR in y-direction
if (sideDistX < sideDistY)
{
sideDistX += deltaDistX;
mapX += stepX;
side = 0;
}
else
{
sideDistY += deltaDistY;
mapY += stepY;
side = 1;
}
//Check if ray has hit a wall
if (worldMap[mapX][mapY] > 0) hit = 1;
}
//Calculate distance projected on camera direction (Euclidean distance will give fisheye effect!)
if (side == 0) perpWallDist = (mapX - posX + (1 - stepX) / 2) / rayDirX;
else perpWallDist = (mapY - posY + (1 - stepY) / 2) / rayDirY;
//Calculate height of line to draw on screen
int lineHeight = (int)(WINDOW_WIDTH / perpWallDist);
//calculate lowest and highest pixel to fill in current stripe
int drawStart = -lineHeight / 2 + WINDOW_WIDTH / 2;
if(drawStart < 0)drawStart = 0;
int drawEnd = lineHeight / 2 + WINDOW_WIDTH / 2;
if(drawEnd >= WINDOW_WIDTH)drawEnd = WINDOW_WIDTH - 1;
//choose wall color
uint32_t color;
//give x and y sides different brightness
if (side == 1) {
switch(worldMap[mapX][mapY])
{
case 1: color = dark_red; break; //red
case 2: color = dark_green; break; //green
case 3: color = dark_blue; break; //blue
case 4: color = gray; break; //white
default: color = dark_yellow; break; //yellow
}
} else {
switch(worldMap[mapX][mapY])
{
case 1: color = red; break; //red
case 2: color = green; break; //green
case 3: color = blue; break; //blue
case 4: color = white; break; //white
default: color = yellow; break; //yellow
}
}
//draw the pixels of the stripe as a vertical line
uint32_t column[WINDOW_WIDTH];
addVerticalLineOfPixels(x, rasterPixels, color, drawStart, drawEnd);
}
//speed modifiers
moveSpeed = frameTime * 5.0;
rotSpeed = frameTime * 3.0;
return rasterPixels;
}
//Need a separate method here to handle input. The SDLWrapper will send inkeys and keypressed to be operated on here
//and have access to all the file scope vars like pos, wroldmap, dir, plane
int main(int /*argc*/, char */*argv*/[])
{
SDLWrapper sdlWrapper;
initRasterPixels();
sdlWrapper.setupSDLRenderer(WINDOW_WIDTH, generateRaster);
};
SDLWrapper.cpp
#include <iostream>
#include <SDL2/SDL.h>
#include "include/SDLWrapper.h"
#include <SDL2/SDL_ttf.h>
#include <sstream>
#include <time.h>
using namespace std;
#define TICK_INTERVAL 4
static Uint32 next_time;
double currTime = 0; //time of current frame
double oldTime = 0; //time of previous frame
double frameTime; //frameTime is the time this frame has taken, in seconds
TTF_Font* sans_font;
////////////////////////////////////////////////////////////////////////////////
//KEYBOARD FUNCTIONS////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
bool SDLWrapper::keyDown(int key) //this checks if the key is held down, returns true all the time until the key is up
{
return (inkeys[key] != 0);
}
bool SDLWrapper::keyPressed(int key) //this checks if the key is *just* pressed, returns true only once until the key is up again
{
if(keypressed.find(key) == keypressed.end()) keypressed[key] = false;
if(inkeys[key])
{
if(keypressed[key] == false)
{
keypressed[key] = true;
return true;
}
}
else keypressed[key] = false;
return false;
}
///// End of KEYBOARD FUNCTIONS ////
int SDLWrapper::setupSDLRenderer(int WINDOW_WIDTH, uint32_t* (*generateRaster)(KeyboardState, double)){
SDL_Event event;
SDL_Renderer *renderer;
SDL_Window *window;
bool run = true;
KeyboardState keyboardState;
uint32_t color = 0;
printf("setupSDLRenderer start");
SDL_Init(SDL_INIT_VIDEO | SDL_INIT_TIMER);
window = SDL_CreateWindow("test", 0, 0, WINDOW_WIDTH, WINDOW_WIDTH, SDL_WINDOW_OPENGL);
renderer = SDL_CreateRenderer(window, -1, SDL_RENDERER_ACCELERATED | SDL_RENDERER_PRESENTVSYNC);
// https://gamedev.stackexchange.com/questions/157604/how-to-get-access-to-framebuffer-as-a-uint32-t-in-sdl2
SDL_Texture* framebuffer = SDL_CreateTexture(renderer, SDL_PIXELFORMAT_ARGB8888, SDL_TEXTUREACCESS_STREAMING, WINDOW_WIDTH, WINDOW_WIDTH);
uint32_t* pixels = new uint32_t[WINDOW_WIDTH*WINDOW_WIDTH];
if(TTF_Init()==-1) {
printf("TTF_Init: %s\n", TTF_GetError());
exit(2);
}
sans_font = TTF_OpenFont("font/Sans.ttf", 12); //this opens a font style and sets a size
if(!sans_font) {
printf("TTF_OpenFont: %s\n", TTF_GetError());
}
SDL_Rect message_rect; //create a rect
message_rect.x = 0; //controls the rect's x coordinate
message_rect.y = 0; // controls the rect's y coordinte
message_rect.w = 20; // controls the width of the rect
message_rect.h = 20; // controls the height of the rect
SDL_Texture* message;
next_time = SDL_GetTicks() + TICK_INTERVAL;
while (run) {
//Start of future render function
SDL_RenderClear(renderer);
rasterPixels = (*generateRaster)(keyboardState, frameTime);
SDL_UpdateTexture(framebuffer , NULL, rasterPixels, WINDOW_WIDTH * sizeof (uint32_t));
SDL_RenderCopy(renderer, framebuffer , NULL, NULL);
// printf("time_left: %d", time_left());
SDL_Delay(time_left()); //remove this to allow for more than 60 iterations a second
next_time = SDL_GetTicks() + TICK_INTERVAL;
//timing for input and FPS counter
oldTime = currTime;
currTime = SDL_GetTicks();
frameTime = (currTime - oldTime) * 0.001; //frameTime is the time this frame has taken, in seconds
SDL_Color White = {255, 255, 255}; // this is the color in rgb format, maxing out all would give you the color white, and it will be your text's color
std::stringstream ss;
ss << (int)(1.0 / frameTime);
const char* str = ss.str().c_str();
SDL_Surface* surfaceMessage = TTF_RenderText_Solid(sans_font, str, White); // as TTF_RenderText_Solid could only be used on SDL_Surface then you have to create the surface first
message = SDL_CreateTextureFromSurface(renderer, surfaceMessage); //now you can convert it into a texture
//Now since it's a texture, you have to put RenderCopy in your game loop area, the area where the whole code executes
SDL_RenderCopy(renderer, message, NULL, &message_rect); //you put the renderer's name first, the Message, the crop size(you can ignore this if you don't want to dabble with cropping), and the rect which is the size and coordinate of your texture
//Don't forget too free your surface and texture
SDL_RenderPresent(renderer);
while (SDL_PollEvent(&event)) {
keyboardState.reset();
//Start of handleInput function
if (event.type == SDL_QUIT) {
run = false;
}
inkeys = SDL_GetKeyboardState(NULL);
if (keyDown(SDL_SCANCODE_UP) || keyDown(SDL_SCANCODE_W)) {
keyboardState.keyUp = true;
} else if (keyDown(SDL_SCANCODE_DOWN) || keyDown(SDL_SCANCODE_S)) {
keyboardState.keyDown = true;
} else if (keyDown(SDL_SCANCODE_LEFT) || keyDown(SDL_SCANCODE_A)) {
keyboardState.keyLeft = true;
} else if (keyDown(SDL_SCANCODE_RIGHT) || keyDown(SDL_SCANCODE_D)) {
keyboardState.keyRight = true;
}
//End of handleInput function
}
}
SDL_DestroyRenderer(renderer);
SDL_DestroyWindow(window);
SDL_Quit();
return EXIT_SUCCESS;
}
uint32_t SDLWrapper::peek(int x, int y, int WINDOW_WIDTH){
return rasterPixels[x+(WINDOW_WIDTH*y)];
}
int SDLWrapper::time_left(void)
{
Uint32 now;
now = SDL_GetTicks();
//rendered too slow, don't wait
if(next_time < now) {
return 0;
}
else {
// rendered too fast, wait until the next tick amount
return next_time - now;
}
}
void KeyboardState::reset(){
keyUp = false;
keyDown = false;
keyLeft = false;
keyRight = false;
}
2 Answers 2
I suggest figuring out how to use a cpu profiler for your particular platform (Visual Studio has a nice one, but there are simpler ones like "Very Sleepy" to give a quick overview).
These will sample your code as it runs (make sure you run an optimized build, not a debug build), and provide detailed feedback as to which line of code takes up the most time.
I put together a quick test project with your code, and ran it under Very Sleepy, which provides output like the following:
Name Exclusive Inclusive %Exclusive %Inclusive Module Source File Source Line Address
generateRaster 13.94s 13.94s 77.24% 77.24% App D:\Projects\SDLProject\Code\App\main.cpp 332 0x7ff6550b14ea
memcpy 0.98s 0.98s 5.43% 5.43% App d:\A01\_work6円\s\src\vctools\crt\vcruntime\src\string\amd64\memcpy.asm 436 0x7ff65516f120
[00007FF934D6EFD0] 0.01s 0.01s 0.04% 0.04% App 0 0x7ff934d6efd0
D3D_UpdateTextureRep 0.01s 0.99s 0.04% 5.50% App D:\Projects\SDLProject\Code\SDL\src\render\direct3d\SDL_render_d3d.c 760 0x7ff6550c7b6c
So a quick glance shows that more than 75% of the time spent running the program was in the generateRaster
function.
It also gives a line by line overview of where the time is spent. (Some lines don't really show up due to optimizations).
We can see that addVerticalLineOfPixels()
is the culprit.
void addVerticalLineOfPixels(int row, uint32_t* rasterPixels, uint32_t color, int start, int end) {
for(int i = 0; i < WINDOW_WIDTH; i++) {
if(i >= start && i <= end) {
rasterPixels[(i*WINDOW_WIDTH)+row] = color;
} else {
rasterPixels[(i*WINDOW_WIDTH)+row] = 0;
}
}
}
This isn't very surprising, since the function effectively touches every single pixel on our screen texture.
We might try to reduce the number of pixels we have to touch by clearing the texture some other way (e.g. blitting to it on the GPU instead). This means we'd only have to manually set the pixels between start
and end
to color
.
However, even if we did that, there's another problem which is described at the very bottom of the linked tutorial:
Raycasting works with vertical stripes, but the screen buffer in memory is laid out with horizontal scanlines. So drawing vertical stripes is bad for memory locality for caching (it is in fact a worst case scenario), and the loss of good caching may hurt the speed more than some of the 3D computations on modern machines. It may be possible to program this with better caching behavior (e.g. processing multiple stripes at once, using a cache-oblivious transpose algorithm, or having a 90 degree rotated raycaster), but for simplicity the rest of this tutorial ignores this caching issue.
In other words, using rasterPixels[(i*WINDOW_WIDTH)+row] = ...
skips across a large amount of memory, and only sets a single pixel in every WINDOW_WIDTH
. We repeat this skipping for every column in the texture.
Instead, we would like to do rasterPixels[row * WINDOW_WIDTH + i] = ...
, so we set every pixel in a contiguous block.
It's actually a simple change:
void addVerticalLineOfPixels(int row, uint32_t* rP, uint32_t color, int start, int end) {
std::fill_n(rP + (row * WINDOW_WIDTH), start, 0);
std::fill_n(rP + (row * WINDOW_WIDTH) + start, (end - start), color);
std::fill_n(rP + (row * WINDOW_WIDTH) + end, (WINDOW_WIDTH - end), 0);
}
(We use std::fill_n
for neatness and performance - we're effectively doing 3 loops over the index ranges [0, start)
, [start, end)
and [end, WINDOW_WIDTH)
).
Then we can replace the texture blitting function SDL_RenderCopy
with SDL_RenderCopyEx
, which lets us rotate the texture while blitting:
SDL_RenderCopyEx(renderer, framebuffer, NULL, NULL, -90.0, NULL, SDL_RendererFlip::SDL_FLIP_NONE);
Measuring these changes with the profiler shows that generateRaster
now takes up <2.5% of the program run time.
:)
I figured out how to cross-compile to windows using mingw compiler. When I run this app in Windows natively, it runs much faster, well over 200fps on my laptop at 900x900 instead of 40-60fps. I believe if I was running this natively on linux and not with wsl2 I would have never had performance concerns.