I've created a Game of Life implementation in JavaScript with the goal of having it be as fast as possible, with the rendering I'm satisfied (see picture bellow), however the next state calculation is really slow and I'm out of ideas how to speed it up even more.
Current state of the implementation
Rendering
Screenshot of the game
Screenshot of the game
I can get 700FPS+ when rendering a total population of 6,986,628
I achieved this by using regl for rendering and moving the calculation of the visible cells to a separate thread (spawned a web worker dedicated for this). I think this doesn't need any optimization, maybe the way I calculate the visible cells.
The way I calculate visible cells
onmessage = function (e) {
var visibleCells = [];
for (const x of e.data.grid.keys()) {
if (x < -(e.data.offsets.X+1)) continue; //Continue until reaches the visible part
if (x > -e.data.offsets.X+e.data.width) break; //Stop after leaving visible part
for (const y of e.data.grid.get(x).keys()) {
if (y < e.data.offsets.Y-1) continue;
if (y > e.data.offsets.Y+e.data.height) break;
visibleCells.push([x, y]);
}
}
this.postMessage({result: visibleCells})
}
Representing the "universe"
I had some ideas on how to represent the Life universe but I stick with the last option as it turned out to be the best performing. (Note that this implementation does not restrict the space so it is an infinite grid)
1.1 Using 2D array as cellState = grid[x][y];
Since we are dealing with an infinite grid this can't be used
1.2 Using 2D array as grid[[x,y],[x1,y2],...]
Storing only the living cell's coordinate. This has the problem of possible duplicates. Also I ran some tests on jsbench.me and turned out that this is slower than the 2nd way (the next one).
2. Using an object
Setting an object's properties to create the illusion of a 2D array. This somewhat worked, but had the problem of the overhead created by converting int to string and via versa, because object indexing uses strings as keys
//Defining grid
var grid = {};
//Creating a cell at (x;y)
if (grid[x] == undefined) grid[x] = {};
grid[x][y] = null;
//Killing a cell at (x;y)
delete grid[x][y];
if (Object.keys(grid[x]).length == 0) delete grid[x];
3. Using Maps and Sets (current)
This way I can use integers as indexes and don't have to deal with the possibility of a duplicate cell
//Defining grid
var grid = new Map();
//Creating a cell at (x;y)
if (!grid.has(x)) grid.set(x, new Set());
grid.get(x).add(y);
//Killing a cell at (x;y)
grid.get(x).delete(y);
if (grid.get(x).size == 0) grid.delete(x);
The next state calculation
This is why I'm writing this question. I don't know how to further improve performance here.
The code for calculating the next state
onmessage = function (e) {
var newGrid = new Map();
var sketch = new Map();
var start = performance.now();
for (var x of e.data.grid.keys()) {
var col1 = x - 1, col3 = x + 1;
if (!sketch.has(col1)) sketch.set(col1, new Set());
if (!sketch.has(x)) sketch.set(x, new Set());
if (!sketch.has(col3)) sketch.set(col3, new Set());
for (var y of e.data.grid.get(x).keys()) {
var row1 = y - 1, row3 = y + 1;
sketch.get(col1).add(row1);
sketch.get(col1).add(y);
sketch.get(col1).add(row3);
sketch.get(x).add(row1);
sketch.get(x).add(row3);
sketch.get(col3).add(row1);
sketch.get(col3).add(y);
sketch.get(col3).add(row3);
}
}
for (var x of sketch.keys()) {
for (var y of sketch.get(x).keys()) {
//Count neighbours
var c = 0;
var col1 = x - 1, col3 = x + 1;
var row1 = y - 1, row3 = y + 1;
if (e.data.grid.has(col1)) {
//1st col
var col = e.data.grid.get(col1);
c += col.has(row1)
c += col.has(y)
c += col.has(row3)
}
if (e.data.grid.has(x)) {
//2nd col
var col = e.data.grid.get(x);
c += col.has(row1)
c += col.has(row3)
}
if (e.data.grid.has(col3)) {
//3rd col
var col = e.data.grid.get(col3);
c += col.has(row1)
c += col.has(y)
c += col.has(row3)
}
if (c == 3) { //If a cell has 3 neighbours it will live
if (!newGrid.has(x)) newGrid.set(x, new Set());
newGrid.get(x).add(y);
continue;
}
//but if it has 2 neigbours it can only survive not born, so check if cell was alive
if (c == 2 && (e.data.grid.has(x) && e.data.grid.get(x).has(y))) {
if (!newGrid.has(x)) newGrid.set(x, new Set());
newGrid.get(x).add(y);
}
}
}
postMessage({ result: newGrid, timeDelta: performance.now() - start });
}
When the worker recives the initial grid it creates two new grids: sketch
this grid will contain potentional new cells (as of writing this I just noticed that I don't add (x;y) to this grid just the neighbouring ones and it still works(削除) , I will look into this deeper after I finish writing (削除ここまで))1, and newGrid
which will contain the final result. This way I only loop throug the cells that maybe change state.
1Turns out it was a lucky error after many refactoring, it works because it's neighbours going to add it to the list but if it has no neighbours it will die
Current performance
+------------------------+-----------+--------+------+
| Population | 6,986,628 | 64,691 | 3 |
+------------------------+-----------+--------+------+
| Iteration time (ms/i) | 23925 | 212 | 0.16 |
+------------------------+-----------+--------+------+
| FPS (all cell visible) | 900+ | 70 | 60 |
+------------------------+-----------+--------+------+
Before you ask I don't know why the fps greater if more cells are rendered, but if you know please write it down in a comment
Attempts to optimize
Split work to CPUcores-2 workers
This was unusable, one iteration took minutes to compute on a ~700K population. I think because the object is copied to each worker so the overhead was much larger than using only one worker.
2 Answers 2
Loop Performance
Using for...of
loops make for great readability, but can be costly when it comes to performance because they use interators internally1 . As this post explains Reversed for
loops can provide best performance. This more recent article also compares for
loops with for...in
and the functional programming style forEach
loops.
ES6 keywords
When writing modern JavaScript there is little use for var
. Best practices today call for defaulting to const
- even for arrays where items are only pushed in via the push method- and if and only if re-assignment is deemed necessary then use let
- e.g. in those for
loops.
Equality comparisons
A good habit and recommendation of many style guides is to use strict equality operators (i.e. ===
, !==
). The problem with loose comparisons is that it has so many weird rules one would need to memorize in order to be confident in its proper usage.
-
\$\begingroup\$ Never thought about the performance difference of different loop types. After I refactor the code I will make an edit with the performance difference. \$\endgroup\$Noel Nemeth– Noel Nemeth2020年09月19日 16:03:37 +00:00Commented Sep 19, 2020 at 16:03
-
1\$\begingroup\$ Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see What should I do when someone answers my question? as well as what you may and may not do after receiving answers. \$\endgroup\$2020年09月19日 16:15:52 +00:00Commented Sep 19, 2020 at 16:15
-
\$\begingroup\$ I meant to add the performance results to the "Attempts to optimize" section \$\endgroup\$Noel Nemeth– Noel Nemeth2020年09月19日 16:21:29 +00:00Commented Sep 19, 2020 at 16:21
-
\$\begingroup\$ Okay I just read through the two post, but now I'm unsure to append the new performance results or not. I tought it would be helpful for others to see how much difference can it make. \$\endgroup\$Noel Nemeth– Noel Nemeth2020年09月19日 16:30:26 +00:00Commented Sep 19, 2020 at 16:30
-
1\$\begingroup\$ @GirkovArpa Good call- I added a reference to an article posted within the last three months \$\endgroup\$2020年09月19日 21:53:27 +00:00Commented Sep 19, 2020 at 21:53
You can greatly speed up this task if have the GPU process it instead of the CPU, since it's the type of task that's capable of being broken up into lots of little, independent chunks.
I didn't want to figure out how to work with the extremely low-level WebGL libraries (for browsers), so I'm just using the gpu.js npm package to help out (works in browsers and node). The following function will take a matrix as an input, compute the matrix for the next generation, and return the computed matrix. The actual contents of the createKernal function is not executed directly, instead, it's source code is read by gpu.js and translated into GPU instructions (therefore, only a subset of javascript syntax is allowed inside of it).
const { GPU } = require('gpu.js');
const REGION_SIZE = 16;
const nextGeneration = new GPU().createKernel(function(matrix) {
const x = this.thread.x;
const y = this.thread.y;
const previousValue = matrix[y][x];
const neighboorCount = (
matrix[y-1][x-1] + matrix[y-1][x ] + matrix[y-1][x+1] +
matrix[y ][x-1] + matrix[y ][x+1] +
matrix[y+1][x-1] + matrix[y+1][x ] + matrix[y+1][x+1]
);
if (previousValue === 1) {
return neighboorCount === 2 || neighboorCount === 3 ? 1 : 0;
} else {
return neighboorCount === 3 ? 1 : 0;
}
}).setOutput([REGION_SIZE, REGION_SIZE]);
Example usage:
const GENERATIONS = 5;
const createBlankMatrix = size => [...Array(size)].map(() => Array(size).fill(0));
function createPopulatedMatrix({ size, startingValues }) {
const matrix = createBlankMatrix(size);
for (const [x, y] of startingValues) {
matrix[y][x] = 1;
}
return matrix;
}
const startingMatrix = createPopulatedMatrix({
size: REGION_SIZE,
startingValues: [
[4, 4], [4, 5], [4, 6],
[5, 5], [5, 7], [6, 7],
],
});
let matrix = startingMatrix;
for (let i = 0; i < GENERATIONS; ++i) {
matrix = nextGeneration(matrix);
}
console.table(matrix);
The following stats compare running the GPU-powered version against the original version. I gave both the same 469,607 random starting values and let them run for 10 generations. I used a grid-size of 2000 for the GPU-powered function.
Original version: 4.858 seconds
GPU version: 1.918 seconds
I didn't spend too long trying to optimize the performance of the function, so I'm sure there's room for improvement there. Another obvious drawback is that the GPU-powered function is using a fixed-sized matrix. You mentioned you wanted to allow this simulation to run over an arbitrarily large surface. This is still possible, but it required breaking up the populated surface into multiple, slightly overlapping regions, feeding the regions into the nextGeneration() function, and copying data from regions to each other to keep their overlapping borders in sync.
Hopefully, this gives you an idea of how you can continue speeding it up if you're willing to put in that kind of effort.
-
\$\begingroup\$ I'll look into this, currently I have issues with docker (this is the reason why the page is down) and I don't have much time now, but I'll try my best to share my findings asap. \$\endgroup\$Noel Nemeth– Noel Nemeth2021年01月23日 17:58:19 +00:00Commented Jan 23, 2021 at 17:58
Explore related questions
See similar questions with these tags.
onmessage
function is getting executed? Do you have control over that frequency? \$\endgroup\$