Optimize iterative pixel blending function

Question 1

Summary

I've created a function that blends tiles on a 2D hexagonal grid with their neighboring tiles based on an image that serves as a "blend mask". It runs, and isn't too intensive, but I feel it can be better / faster. Drawing pixel-by-pixel is very intensive even for the 108x122 (Relatively small) images I'm using. With so many iterations per loop (13,176 in this case) per tile, even a small optimization could make a significant improvement.

I'll outline my process below, and I welcome all feedback and suggestions for optimization. If there's a way to achieve this without pixel-by-pixel updates I'd be intrigued to hear about it. I've noticed that context.drawImage() seems to draw much faster than pixel-by-pixel updates and I'm not sure how that's achieved. Maybe I could utilize a similar process?

The Blend Mask

Blend Mask

This is the (actual size) blend mask I've devised to control the most basic blending technique. There are six color ranges, each representing which tile to pull blend a texture from, and at which opacity:

0<Red<127: Top Right
0<Green<127: Mid Right
0<Blue<127: Bot Right
128<Red<255: Bot Left
128<Green<255: Mid Left
128<Blue<255: Top Left

Opacity is determined by the modulo of 127 into the corresponding R, G or B value. For example, a pixel defined as RGB(200, 0, 100) would mix the bottom left neighbor's texture at 200%127 = 73 opacity and the bottom right neighbor's texture at 100%127 = 100 opacity. You'd find such a pixel close to the bottom of this particular blend map.

I would like to continue using my blend mask method regardless of optimization capabilities due to the fact that I can alter this blend mask at any time, not only modifying the current shape but also adding more natural / chaotic looking blends to enhance aesthetics.

Example Blend

Grass Tile Sand Tile Dirt Tile

Example Blend

Relevant Code

My blend function. This is where over 90% of the actual processing takes place (Specifically in the pixel loop for(var i = 0; i < mask.width*mask.height*4; i+=4)), so this is where I need to make the optimizations. All other code is provided solely to help understand the function's process.

function blend(mask, row, col) {
 blender.drawImage(mask, 0, 0, mask.width, mask.height);
 var imageData = blender.getImageData(0, 0, mask.width, mask.height);
 var data = imageData.data;
 var tileType = [
 /*Top Right:*/ (row%2 == 0) ? getTile(row-1, col+1) : getTile(row-1, col+0),
 /*Mid Right:*/ getTile(row+0, col+1),
 /*Bot Right:*/ (row%2 == 0) ? getTile(row+1, col+1) : getTile(row+1, col+0),
 /*Bot Left: */ (row%2 == 0) ? getTile(row+1, col+0) : getTile(row+1, col-1),
 /*Mid Left: */ getTile(row+0, col-1),
 /*Top Left: */ (row%2 == 0) ? getTile(row-1, col+0) : getTile(row-1, col-1),
 ];
 // This is where over 90% of the processing occurs
 for(var i = 0; i < mask.width*mask.height*4; i+=4) {
 var colors = new Array();
 var alphaSum = 0;
 for(var side = 0; side < tileType.length; side++) {
 var semi = 127 * Math.floor(side/3);
 var index = i+side%3;
 if(data[index] > semi && data[index] <= semi+128) {
 if(tileType[side] != null) {
 alphaSum += data[index] - semi;
 colors.push({
 r: images[tileType[side]].data[i+0],
 g: images[tileType[side]].data[i+1],
 b: images[tileType[side]].data[i+2],
 a: data[index] - semi
 });
 }
 }
 }
 if(colors.length > 1) {
 var colorSum = { r:0, g:0, b:0 };
 for(var index = 0; index < colors.length; index++) {
 var alphaRatio = colors[index].a / alphaSum;
 colorSum.r += colors[index].r * alphaRatio;
 colorSum.g += colors[index].g * alphaRatio;
 colorSum.b += colors[index].b * alphaRatio;
 }
 data[i+0] = colorSum.r;
 data[i+1] = colorSum.g;
 data[i+2] = colorSum.b;
 data[i+3] = 255 - 255/((alphaSum+127)/127);
 } else if (colors.length > 0) {
 data[i+0] = colors[0].r;
 data[i+1] = colors[0].g;
 data[i+2] = colors[0].b;
 data[i+3] = 255 - 255/((colors[0].a+127)/127);
 } else {
 data[i+3] = 0;
 }
 } 
 blender.putImageData(imageData, 0, 0);
}

My draw function, which calls my blend function once per tile:

function draw() {
 var start = Date.now();
 for(var row = 0; row < tiles.length; row++) {
 for(var col = 0; col < tiles[row].length; col++) {
 var tileType = tiles[row][col];
 var indent = (row%2) ? 0.0 : 0.5 ;
 var y = (images[tileType].height+4) * row * 0.75;
 var x = images[tileType].width * (col+indent);
 buffer.drawImage(images[tileType], 0, 0);
 blend(blendMask, row, col);
 ctx.drawImage(bCanvas, x, y, images[tileType].width, images[tileType].height);
 ctx.drawImage(blCanvas, x, y, images[tileType].width, images[tileType].height);
 }
 }
 console.log("Draw executed in " + (Date.now() - start) + "ms"); 
}

Additional Code

Declaring canvases and contexts:

canvas = $("#blendCanvas")[0];
ctx = canvas.getContext("2d");
bCanvas = $("#buffer")[0];
buffer = bCanvas.getContext("2d");
blCanvas = $("#blender")[0];
blender = blCanvas.getContext("2d");

My tile images array (Pseudo-code):

var images = new Array();
var imageURLs = [
 "images/gameboard/grasshex.png",
 "images/gameboard/sandhex.png",
 "images/gameboard/dirthex.png"
];
for(var i = 0; i < imageURLs.length; i++) {
 images.push(new Image());
 images[i].src = imageURLs[i];
 $(images[i]).load(function () {
 // Draw image to a buffer canvas
 this.data = buffer.getImageData().data;
}

Defining example tile grid (As seen above):

var tiles = [
 [0, 1],
 [2, 1, 2],
 [1, 0]
];

Working Example

And of course, a working example: https://dl.dropbox.com/u/7377788/Other/Blend2/avblend.html

Thanks!

I hope this isn't too long-winded, and I'm looking forward to seeing if anybody has any suggestions for optimization. Thanks for reading!

Question 2

Excellent, well-formatted, and interesting question. +1! I hope you'll get some helpful answers.

Question 3

The dropbox link is not working for me.

Question 4

First of all, very nice and interesting question!

Second, drawImage is fast because it's native, compiled code - there's no interpreted JavaScript going on. And since it's native, it can utilize the computer in ways browser-based JavaScript simply can't. For instance, drawImage might be multi-threaded, and/or use the GPU to do the drawing. It's basically written "closer to the metal" than you can hope to achieve in JavaScript.

But! There are some things you can do with the javascript you have.

I messed around with it a little, and got a big improvement by doing some very, very simple things. In fact, I didn't change any of your logic at all.

In order to test, I changed the checkbox's click-handler to run the draw method 100 times in a row, and log the cumulative time for those 100 runs rather than log for each run. This was basically to "smooth" the stats by having more samples.

Now, the first thing I noticed about your code is that you declare variables inside the loops, meaning things get redeclared again and again. I can't say how much this affects performance, because different JS runtimes have different optimizations for such things. But in any case, it's just good practice to declare all local variables at the top of a functions.

The other thing I did, which I think had the largest impact, was simply caching certain values that were otherwise being calculated and re-calculated on each loop. I.e. I went from this:

for(var i = 0; i < mask.width*mask.height*4; i+=4) {
 ...
 for(var side = 0; side < tileType.length; side++) {

to this:

var i,
 l = mask.width * mask.height * 4, // calculate this once!
 side,
 types = tileType.length; // .length can be cached too (see: http://jsperf.com/caching-array-length)
for(i = 0 ; i < l ; i += 4) {
 ...
 for(side = 0 ; side < types ; side++) {

Of course, I moved every other var declaration to the top of the blend function as well. I also put colors.length in a variable, and used it for both the loop and the if-else if branching

Overall, by doing these basic things (and only in blend() - didn't touch anything else), I got the following results:

Original blend (x100): ~2300ms
Revised blend (x100): ~1200ms

(this was in Chrome/Mac)

So I basically shaved ~10ms per iteration (or 1 second for 100 iterations) off by just doing some basic code-cleanup.

Now, I'm sure there's more stuff to find in this vein. And there may well be even more to find, by doing actual refactoring - again, I didn't actually refactor anything at all. And I didn't even do basic clean-up on the rest of the code.

Here's what I ended up with:

function blend(mask, row, col) {
 "use strict";
 var imageData,
 data,
 oddRow = row % 2,
 tileType = [
 oddRow ? getTile(row-1, col+1) : getTile(row-1, col), /*Top Right:*/
 getTile(row, col+1), /*Mid Right:*/
 oddRow ? getTile(row+1, col+1) : getTile(row+1, col), /*Bot Right:*/
 oddRow ? getTile(row+1, col) : getTile(row+1, col-1), /*Bot Left: */
 getTile(row, col-1), /*Mid Left: */
 oddRow ? getTile(row-1, col) : getTile(row-1, col-1), /*Top Left: */
 ],
 i,
 l = mask.width*mask.height*4,
 side,
 types = tileType.length,
 colors,
 alphaSum,
 alphaRatio,
 colorSum,
 semi,
 index,
 colorsLength;
 blender.drawImage(mask, 0, 0, mask.width, mask.height);
 imageData = blender.getImageData(0, 0, mask.width, mask.height);
 data = imageData.data;
 for(i = 0 ; i < l ; i += 4) {
 colors = [];
 alphaSum = 0;
 for(side = 0; side < types; side++) {
 semi = 127 * Math.floor(side/3);
 index = i+side%3;
 if(data[index] > semi && data[index] <= semi+128) {
 if(tileType[side] != null) {
 alphaSum += data[index] - semi;
 colors.push({
 r: images[tileType[side]].data[i+0],
 g: images[tileType[side]].data[i+1],
 b: images[tileType[side]].data[i+2],
 a: data[index] - semi
 });
 }
 }
 }
 colorsLength = colors.length;
 if(colorsLength > 1) {
 colorSum = { r:0, g:0, b:0 };
 for(index = 0; index < colorsLength; index++) {
 alphaRatio = colors[index].a / alphaSum;
 colorSum.r += colors[index].r * alphaRatio;
 colorSum.g += colors[index].g * alphaRatio;
 colorSum.b += colors[index].b * alphaRatio;
 }
 data[i+0] = colorSum.r;
 data[i+1] = colorSum.g;
 data[i+2] = colorSum.b;
 data[i+3] = 255 - 255/((alphaSum+127)/127);
 counts[0]++;
 } else if (colorsLength > 0) {
 data[i+0] = colors[0].r;
 data[i+1] = colors[0].g;
 data[i+2] = colors[0].b;
 data[i+3] = 255 - 255/((colors[0].a+127)/127);
 counts[1]++;
 } else {
 data[i+3] = 0;
 counts[2]++;
 }
 } 
 blender.putImageData(imageData, 0, 0);
}

Question 5

You know what: I tried moving the variable declarations out of the loop but it seemed to make a neglible difference so I left them in favor of readability. However, I can't believe I didn't consider moving the calculation out of the for loop. I guess it's been so long since I've done anything this CPU intensive that I completely forgot this condition was recalculated on every iteration and took it for granted that it would only calculate it once. Thank you! Your answer was very detailed and well-written.

Question 6

@rrowland You're welcome. And yeah, the readability suffers. Under different circumstances, I'd advocate splitting things into several functions, which would of course make it easier on the eyes, but it'd make expression caching harder and (I suspect) introduce some heavy calling-overhead. I'm afraid you'll have to accept some long-winded functions and reduced readability in the interest of performance. However you can cache some things (like mask.width*mask.height*4) even earlier than at the top of the function, which would help.

Flambino Flambino 33.3k2 gold badges46 silver badges90 bronze badges · Accepted Answer · 2012-11-05 01:57:44Z

First of all, very nice and interesting question!

Second, drawImage is fast because it's native, compiled code - there's no interpreted JavaScript going on. And since it's native, it can utilize the computer in ways browser-based JavaScript simply can't. For instance, drawImage might be multi-threaded, and/or use the GPU to do the drawing. It's basically written "closer to the metal" than you can hope to achieve in JavaScript.

But! There are some things you can do with the javascript you have.

I messed around with it a little, and got a big improvement by doing some very, very simple things. In fact, I didn't change any of your logic at all.

In order to test, I changed the checkbox's click-handler to run the draw method 100 times in a row, and log the cumulative time for those 100 runs rather than log for each run. This was basically to "smooth" the stats by having more samples.

Now, the first thing I noticed about your code is that you declare variables inside the loops, meaning things get redeclared again and again. I can't say how much this affects performance, because different JS runtimes have different optimizations for such things. But in any case, it's just good practice to declare all local variables at the top of a functions.

The other thing I did, which I think had the largest impact, was simply caching certain values that were otherwise being calculated and re-calculated on each loop. I.e. I went from this:

for(var i = 0; i < mask.width*mask.height*4; i+=4) {
 ...
 for(var side = 0; side < tileType.length; side++) {

to this:

var i,
 l = mask.width * mask.height * 4, // calculate this once!
 side,
 types = tileType.length; // .length can be cached too (see: http://jsperf.com/caching-array-length)
for(i = 0 ; i < l ; i += 4) {
 ...
 for(side = 0 ; side < types ; side++) {

Of course, I moved every other var declaration to the top of the blend function as well. I also put colors.length in a variable, and used it for both the loop and the if-else if branching

Overall, by doing these basic things (and only in blend() - didn't touch anything else), I got the following results:

Original blend (x100): ~2300ms
Revised blend (x100): ~1200ms

(this was in Chrome/Mac)

So I basically shaved ~10ms per iteration (or 1 second for 100 iterations) off by just doing some basic code-cleanup.

Now, I'm sure there's more stuff to find in this vein. And there may well be even more to find, by doing actual refactoring - again, I didn't actually refactor anything at all. And I didn't even do basic clean-up on the rest of the code.

Here's what I ended up with:

function blend(mask, row, col) {
 "use strict";
 var imageData,
 data,
 oddRow = row % 2,
 tileType = [
 oddRow ? getTile(row-1, col+1) : getTile(row-1, col), /*Top Right:*/
 getTile(row, col+1), /*Mid Right:*/
 oddRow ? getTile(row+1, col+1) : getTile(row+1, col), /*Bot Right:*/
 oddRow ? getTile(row+1, col) : getTile(row+1, col-1), /*Bot Left: */
 getTile(row, col-1), /*Mid Left: */
 oddRow ? getTile(row-1, col) : getTile(row-1, col-1), /*Top Left: */
 ],
 i,
 l = mask.width*mask.height*4,
 side,
 types = tileType.length,
 colors,
 alphaSum,
 alphaRatio,
 colorSum,
 semi,
 index,
 colorsLength;
 blender.drawImage(mask, 0, 0, mask.width, mask.height);
 imageData = blender.getImageData(0, 0, mask.width, mask.height);
 data = imageData.data;
 for(i = 0 ; i < l ; i += 4) {
 colors = [];
 alphaSum = 0;
 for(side = 0; side < types; side++) {
 semi = 127 * Math.floor(side/3);
 index = i+side%3;
 if(data[index] > semi && data[index] <= semi+128) {
 if(tileType[side] != null) {
 alphaSum += data[index] - semi;
 colors.push({
 r: images[tileType[side]].data[i+0],
 g: images[tileType[side]].data[i+1],
 b: images[tileType[side]].data[i+2],
 a: data[index] - semi
 });
 }
 }
 }
 colorsLength = colors.length;
 if(colorsLength > 1) {
 colorSum = { r:0, g:0, b:0 };
 for(index = 0; index < colorsLength; index++) {
 alphaRatio = colors[index].a / alphaSum;
 colorSum.r += colors[index].r * alphaRatio;
 colorSum.g += colors[index].g * alphaRatio;
 colorSum.b += colors[index].b * alphaRatio;
 }
 data[i+0] = colorSum.r;
 data[i+1] = colorSum.g;
 data[i+2] = colorSum.b;
 data[i+3] = 255 - 255/((alphaSum+127)/127);
 counts[0]++;
 } else if (colorsLength > 0) {
 data[i+0] = colors[0].r;
 data[i+1] = colors[0].g;
 data[i+2] = colors[0].b;
 data[i+3] = 255 - 255/((colors[0].a+127)/127);
 counts[1]++;
 } else {
 data[i+3] = 0;
 counts[2]++;
 }
 } 
 blender.putImageData(imageData, 0, 0);
}

You know what: I tried moving the variable declarations out of the loop but it seemed to make a neglible difference so I left them in favor of readability. However, I can't believe I didn't consider moving the calculation out of the for loop. I guess it's been so long since I've done anything this CPU intensive that I completely forgot this condition was recalculated on every iteration and took it for granted that it would only calculate it once. Thank you! Your answer was very detailed and well-written.
@rrowland You're welcome. And yeah, the readability suffers. Under different circumstances, I'd advocate splitting things into several functions, which would of course make it easier on the eyes, but it'd make expression caching harder and (I suspect) introduce some heavy calling-overhead. I'm afraid you'll have to accept some long-winded functions and reduced readability in the interest of performance. However you can cache some things (like mask.width*mask.height*4) even earlier than at the top of the function, which would help.

Stack Exchange Network

Optimize iterative pixel blending function

Summary

The Blend Mask

Example Blend

Relevant Code

Additional Code

Working Example

Thanks!

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Optimize iterative pixel blending function

Summary

The Blend Mask

Example Blend

Relevant Code

Additional Code

Working Example

Thanks!

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions