Function to rotate bitmap image

Question 1

Presenting you an implementation of the "shear and transpose" method of rotating a bitmap image. It works.. but it works slow, which is fully understandable. It is understandable due to the evident lack of optimization. Notwithstanding it doesn't interact with the VGA or any library for that matter.

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
/* bmp offsets */
#define OFFSET_OF_ID (0x0)
#define OFFSET_OF_SIZE (0x2)
#define OFFSET_OF_PIXELS (0xA)
#define OFFSET_OF_NDIB (0xE)
#define OFFSET_OF_WIDTH (0x12)
#define OFFSET_OF_HEIGHT (0x16)
#define OFFSET_OF_BPP (0x1C)
#define OFFSET_OF_NRAW (0x22)
/* bmp is especially dedicated for windows. defining windows-specific types: */
typedef unsigned char byte, *buffer;
typedef unsigned short word;
typedef unsigned long dword;
typedef unsigned long long ddword;
typedef unsigned char pixel[3];
/* bitmapS[nBitmap][offset] */
buffer *bitmapS;
/* Helper function */
ddword bmp_find_xy (dword width, dword xp, dword yp);
buffer
bmp_rotate
(dword nBitmap, float angle)
{
 const dword src_width = *( (dword*)&bitmapS[nBitmap][OFFSET_OF_WIDTH]);
 const dword src_height = *( (dword*)&bitmapS[nBitmap][OFFSET_OF_HEIGHT]);
 const dword src_nraw = *( (dword*)&bitmapS[nBitmap][OFFSET_OF_NRAW]);
 const dword src_pixels = *( (dword*)&bitmapS[nBitmap][OFFSET_OF_PIXELS]);
 buffer dest = malloc( src_pixels + src_nraw );
 float midX, midY;
 float deltaX, deltaY;
 dword rotX, rotY;
 dword i, j;
 midX = src_width / 2.0f;
 midY = src_height / 2.0f;
 memcpy(dest, bitmapS[nBitmap], src_pixels);
 for(i = 0; i < src_width; i++)
 for(j = 0; j < src_height; j++)
 {
 deltaX = i - midX;
 deltaY = j - midY;
 rotX = (dword)(midX + deltaX * sin(angle) + deltaY * cos(angle));
 rotY = (dword)(midY + deltaX * cos(angle) - deltaY * sin(angle));
 if(rotX >= 0 && rotX < src_width && rotY >= 0 && rotY < src_height)
 {
 ddword dest_offset = bmp_find_xy(src_width, i, j);
 ddword src_offset = bmp_find_xy(src_width, rotX, rotY);
 deltaX = i + midX;
 deltaY = j + midY;
 memcpy(&dest[dest_offset], &bitmapS[nBitmap][src_offset], sizeof(pixel);
 }
 }
 return dest;
}
/* Definition of helper function */
ddword
bmp_find_xy 
(dword width, dword xp, dword yp)
{
 dword w = width;
 const dword channels = sizeof(pixel);
 const dword bpp = 8;
 const dword single = (channels * bpp) / 7;
 const dword offset = sizeof(BMP) + sizeof(DIB);
 dword rowsize = w * single;
 dword pixAddress;
 if (rowsize % 4 != 0) rowsize += 4 - (rowsize % 4);
 pixAddress = offset + yp * rowsize + xp * single;
 return pixAddress;
}

Question 2

Code compilation issues

The code didn't compile cleanly for me.

There was a missing right paren on this line:

 memcpy(&dest[dest_offset], &bitmapS[nBitmap][src_offset], sizeof(pixel);

I got a warning that memset was used without a prototype. I added #include <string.h> to fix that.
The code was missing the BMP and DIB structures. I substituted 14 for sizeof(BMP) and 40 for sizeof(DIB) to compensate.

Code style/formatting

Keeping in mind that style and formatting are somewhat subjective, here are some issues I spotted:

I didn't like the use of a nested for loop at the same indentation. It confused me a little bit:
```
for(i = 0; i < src_width; i++)
for(j = 0; j < src_height; j++)
```
I didn't like the "Windows-ization" of the types. Personally, I don't like seeing ddword, dword, and word because I really don't have any idea what those are. I prefer to just #include <stdint.h> and use uint64_t, uint32_t, and uint15_t. Plus, the stdint types are safer to use. Your dword type, for example, is presumably supposed to be 32 bits wide, but you typedef'd it to unsigned long. On a 64-bit machine, unsigned long is likely to be 64 bits wide.
Currently, bmp_rotate() takes an index as an argument, and then indexes into a global array of bitmaps. The code would be more reusable if the function took one bitmap as an argument, like this:
```
buffer bmp_rotate(const buffer bitmap, float angle);
```
Then it could be used from anywhere without being entangled with the global variable.
There is dead code in the inner loop. This code doesn't do anything:
```
 deltaX = i + midX;
 deltaY = j + midY;
```

Bugs in the code

The rotation is wrong. For example, if you pass in 0 for the angle, you should wind up with the same bitmap. Instead, you get a bitmap that is rotated 90 degrees to the right and mirror imaged. To fix it, I changed your rotation computation to this:
```
rotX = (dword)(midX + deltaX * cos(angle) + deltaY * sin(angle));
rotY = (dword)(midY - deltaX * sin(angle) + deltaY * cos(angle));
```
There are uninitialized pixels in the rotated bitmap. The destination bitmap is created like this:
```
buffer dest = malloc( src_pixels + src_nraw );
memcpy(dest, bitmapS[nBitmap], src_pixels);
```
This allocates a bitmap of the same size of the original and copies the header information from the original to the new bitmap. However, the pixels in the new bitmap are all uninitialized. Later, when doing the rotation, not all pixels in the new bitmap are set (because some are not mapped to any original pixel). So those pixels end up with a random value. You can fix this by simply using calloc() to allocate the new bitmap.
The offset of the pixels may not be right. Currently, the offset from the start of the bmp data to the first pixel is calculated like this:
```
const dword offset = sizeof(BMP) + sizeof(DIB);
```
I'm not an expert on bmp files, but I believe that the DIB part is a variable length part. So to get the right offset, you should be using the value you call src_pixels as your offset instead.

Speed issues covered up by the compiler

There are a couple of things that I spotted that you could have done better speedwise, but that the compiler (gcc -O4) seemed to fix for you.

Instead of using sin(angle) and cos(angle) in your inner loop, you could compute those once at the start of the function, and then use the precomputed versions later on:
```
float sin_angle = sin(angle);
float cos_angle = cos(angle);
...
rotX = (dword)(midX + deltaX * cos_angle + deltaY * sin_angle);
rotY = (dword)(midY - deltaX * sin_angle + deltaY * cos_angle);
```
Apparently, gcc is smart enough to know that sin() and cos() are pure functions, so it was able to optimize away the calls in the inner loop. Still, it's good practice to do this kind of optimization yourself. If instead of sin() and cos() you were calling your own function, then the compiler might not know the optimization was possible.
In bmp_find_xy(), all of the lines up to the computation of pixAddress are the same for a given sized bitmap. So you could do an optimization where you compute offset, rowsize, and single just once, and then use those precomputed values in the inner loop. Just as with #1, gcc was smart enough to do all that already.

Speed issues not covered by the compiler

The way that you order your nested loop, on every iteration, you move from one destination pixel to the one below it. If you swapped the two loops, you would instead move from one pixel to the one to its right. This is a better ordering, because it is more cache friendly. In fact, when I swapped the two loops, I was able to get a 5% speedup on a 45 degree rotation.

Oftentimes with loops, you can often simplify the calculation within an inner loop if you compute an initial value and then increment the value instead of recomputing the whole thing. For example, let's look at a sample loop:

for (i=0;i<n;i++) {
 offset = i * ROW_WIDTH + startOffset;
 do_something(offset);
}

You can simplify this to:

offset = 0 * ROW_WIDTH + startOffset;
for (i=0;i<n;i++) {
 do_something(offset);
 offset += ROW_WIDTH;
}

Using this technique, I simplified your main loop to this:

for(j = 0; j < src_height; j++) {
 ddword dest_offset = src_pixels + j * rowsize;
 float deltaY = j - midY;
 float deltaX = 0 - midX;
 float x = midX + deltaX * cos_angle + deltaY * sin_angle;
 float y = midY - deltaX * sin_angle + deltaY * cos_angle;
 for(i = 0; i < src_width; i++) {
 dword rotX = x;
 dword rotY = y;
 if(rotX >= 0 && rotX < src_width && rotY >= 0 && rotY < src_height)
 {
 ddword src_offset = src_pixels + rotY * rowsize + rotX * single;
 memcpy(&dest[dest_offset], &bitmap[src_offset], sizeof(pixel));
 }
 x += cos_angle;
 y -= sin_angle;
 dest_offset += single;
 }
}

Notice how the inner loop has replaced three complicated expressions with three simple additions. This change resulted in a 140% speedup (it was 2.4x faster)!

Memcpy isn't so great for only 3 bytes. I replaced your call to memcpy with some inlined code:

 // Old code:
 // memcpy(&dest[dest_offset], &bitmap[src_offset], sizeof(pixel));
 // New code:
 dest[dest_offset+0] = bitmap[src_offset+0];
 dest[dest_offset+1] = bitmap[src_offset+1];
 dest[dest_offset+2] = bitmap[src_offset+2];

This resulted in a 9% speedup.

Question 3

Thanks.. It truly increased FPS! Using your simplified loop the image being rotated is getting top/bottom cropped in some angles and I need to try understanding why.

Question 4

Fixed. Your review is more than perfect.

Question 5

@Edenia I'm not sure about the cropping but I noticed that the conversion of rotX and rotY are being done by rounding down always. You could round to nearest by adding + 0.5f to where x and y are initialized. Like this: float x = midX + deltaX * cos_angle + deltaY * sin_angle + 0.5f;. I'm not sure if that will help anything though.

Question 6

No I fixed it. It was nothing. Also yes I also encountered the near rounding problem. Also it may be a bit strange but memcpy works faster for me for my c89.

Question 7

By the way did you also spotted that by rotating, the source image is copied in slightly different locations I am not sure if fixing this will be easy, because I mostly understand the ways invented by me. Also clueless.

Question 8

You chose to iterate source pixels. This is a standard mistake leading to loss of quality. The problem is twofold: some source pixel may land on the same destination pixel, and some destination pixels may have no preimage.

Instead you need to iterate over destination pixels. Find the preimage (floating point), and interpolate the known pixels around it.

Question 9

It will be a bit hard to set up code working with iteration through destination pixels. This will also change dimensions and I have bigger problems atm though, so I can't say a thing.

Question 10

Actually it is iterating destination pixels, read it again.

Question 11

I read but I don't trust my eyes.. .. and I don't know what you mean, I don't understand a word you said.

JS1 JS1 28.9k3 gold badges41 silver badges83 bronze badges · Answer 1 · 2015-04-10 19:08:23Z

Code compilation issues

The code didn't compile cleanly for me.

There was a missing right paren on this line:

 memcpy(&dest[dest_offset], &bitmapS[nBitmap][src_offset], sizeof(pixel);

I got a warning that memset was used without a prototype. I added #include <string.h> to fix that.
The code was missing the BMP and DIB structures. I substituted 14 for sizeof(BMP) and 40 for sizeof(DIB) to compensate.

Code style/formatting

Keeping in mind that style and formatting are somewhat subjective, here are some issues I spotted:

I didn't like the use of a nested for loop at the same indentation. It confused me a little bit:
```
for(i = 0; i < src_width; i++)
for(j = 0; j < src_height; j++)
```
I didn't like the "Windows-ization" of the types. Personally, I don't like seeing ddword, dword, and word because I really don't have any idea what those are. I prefer to just #include <stdint.h> and use uint64_t, uint32_t, and uint15_t. Plus, the stdint types are safer to use. Your dword type, for example, is presumably supposed to be 32 bits wide, but you typedef'd it to unsigned long. On a 64-bit machine, unsigned long is likely to be 64 bits wide.
Currently, bmp_rotate() takes an index as an argument, and then indexes into a global array of bitmaps. The code would be more reusable if the function took one bitmap as an argument, like this:
```
buffer bmp_rotate(const buffer bitmap, float angle);
```
Then it could be used from anywhere without being entangled with the global variable.
There is dead code in the inner loop. This code doesn't do anything:
```
 deltaX = i + midX;
 deltaY = j + midY;
```

Bugs in the code

The rotation is wrong. For example, if you pass in 0 for the angle, you should wind up with the same bitmap. Instead, you get a bitmap that is rotated 90 degrees to the right and mirror imaged. To fix it, I changed your rotation computation to this:
```
rotX = (dword)(midX + deltaX * cos(angle) + deltaY * sin(angle));
rotY = (dword)(midY - deltaX * sin(angle) + deltaY * cos(angle));
```
There are uninitialized pixels in the rotated bitmap. The destination bitmap is created like this:
```
buffer dest = malloc( src_pixels + src_nraw );
memcpy(dest, bitmapS[nBitmap], src_pixels);
```
This allocates a bitmap of the same size of the original and copies the header information from the original to the new bitmap. However, the pixels in the new bitmap are all uninitialized. Later, when doing the rotation, not all pixels in the new bitmap are set (because some are not mapped to any original pixel). So those pixels end up with a random value. You can fix this by simply using calloc() to allocate the new bitmap.
The offset of the pixels may not be right. Currently, the offset from the start of the bmp data to the first pixel is calculated like this:
```
const dword offset = sizeof(BMP) + sizeof(DIB);
```
I'm not an expert on bmp files, but I believe that the DIB part is a variable length part. So to get the right offset, you should be using the value you call src_pixels as your offset instead.

Speed issues covered up by the compiler

There are a couple of things that I spotted that you could have done better speedwise, but that the compiler (gcc -O4) seemed to fix for you.

Instead of using sin(angle) and cos(angle) in your inner loop, you could compute those once at the start of the function, and then use the precomputed versions later on:
```
float sin_angle = sin(angle);
float cos_angle = cos(angle);
...
rotX = (dword)(midX + deltaX * cos_angle + deltaY * sin_angle);
rotY = (dword)(midY - deltaX * sin_angle + deltaY * cos_angle);
```
Apparently, gcc is smart enough to know that sin() and cos() are pure functions, so it was able to optimize away the calls in the inner loop. Still, it's good practice to do this kind of optimization yourself. If instead of sin() and cos() you were calling your own function, then the compiler might not know the optimization was possible.
In bmp_find_xy(), all of the lines up to the computation of pixAddress are the same for a given sized bitmap. So you could do an optimization where you compute offset, rowsize, and single just once, and then use those precomputed values in the inner loop. Just as with #1, gcc was smart enough to do all that already.

Speed issues not covered by the compiler

The way that you order your nested loop, on every iteration, you move from one destination pixel to the one below it. If you swapped the two loops, you would instead move from one pixel to the one to its right. This is a better ordering, because it is more cache friendly. In fact, when I swapped the two loops, I was able to get a 5% speedup on a 45 degree rotation.

Oftentimes with loops, you can often simplify the calculation within an inner loop if you compute an initial value and then increment the value instead of recomputing the whole thing. For example, let's look at a sample loop:

for (i=0;i<n;i++) {
 offset = i * ROW_WIDTH + startOffset;
 do_something(offset);
}

You can simplify this to:

offset = 0 * ROW_WIDTH + startOffset;
for (i=0;i<n;i++) {
 do_something(offset);
 offset += ROW_WIDTH;
}

Using this technique, I simplified your main loop to this:

for(j = 0; j < src_height; j++) {
 ddword dest_offset = src_pixels + j * rowsize;
 float deltaY = j - midY;
 float deltaX = 0 - midX;
 float x = midX + deltaX * cos_angle + deltaY * sin_angle;
 float y = midY - deltaX * sin_angle + deltaY * cos_angle;
 for(i = 0; i < src_width; i++) {
 dword rotX = x;
 dword rotY = y;
 if(rotX >= 0 && rotX < src_width && rotY >= 0 && rotY < src_height)
 {
 ddword src_offset = src_pixels + rotY * rowsize + rotX * single;
 memcpy(&dest[dest_offset], &bitmap[src_offset], sizeof(pixel));
 }
 x += cos_angle;
 y -= sin_angle;
 dest_offset += single;
 }
}

Notice how the inner loop has replaced three complicated expressions with three simple additions. This change resulted in a 140% speedup (it was 2.4x faster)!

Memcpy isn't so great for only 3 bytes. I replaced your call to memcpy with some inlined code:

 // Old code:
 // memcpy(&dest[dest_offset], &bitmap[src_offset], sizeof(pixel));
 // New code:
 dest[dest_offset+0] = bitmap[src_offset+0];
 dest[dest_offset+1] = bitmap[src_offset+1];
 dest[dest_offset+2] = bitmap[src_offset+2];

This resulted in a 9% speedup.

Thanks.. It truly increased FPS! Using your simplified loop the image being rotated is getting top/bottom cropped in some angles and I need to try understanding why.
@Edenia I'm not sure about the cropping but I noticed that the conversion of rotX and rotY are being done by rounding down always. You could round to nearest by adding + 0.5f to where x and y are initialized. Like this: float x = midX + deltaX * cos_angle + deltaY * sin_angle + 0.5f;. I'm not sure if that will help anything though.
No I fixed it. It was nothing. Also yes I also encountered the near rounding problem. Also it may be a bit strange but memcpy works faster for me for my c89.
By the way did you also spotted that by rotating, the source image is copied in slightly different locations I am not sure if fixing this will be easy, because I mostly understand the ways invented by me. Also clueless.

vnp vnp 58.7k4 gold badges55 silver badges144 bronze badges · Answer 2 · 2015-04-11 06:19:38Z

0

\$\begingroup\$

You chose to iterate source pixels. This is a standard mistake leading to loss of quality. The problem is twofold: some source pixel may land on the same destination pixel, and some destination pixels may have no preimage.

Instead you need to iterate over destination pixels. Find the preimage (floating point), and interpolate the known pixels around it.

Share

answered Apr 11, 2015 at 6:19

vnp's user avatar

vnp vnp

58.7k4 gold badges55 silver badges144 bronze badges

\$\endgroup\$

3

\$\begingroup\$ It will be a bit hard to set up code working with iteration through destination pixels. This will also change dimensions and I have bigger problems atm though, so I can't say a thing. \$\endgroup\$

Edenia
– Edenia

2015年04月11日 06:23:27 +00:00
Commented Apr 11, 2015 at 6:23
\$\begingroup\$ Actually it is iterating destination pixels, read it again. \$\endgroup\$

JS1
– JS1

2015年04月11日 06:55:10 +00:00
Commented Apr 11, 2015 at 6:55
\$\begingroup\$ I read but I don't trust my eyes.. .. and I don't know what you mean, I don't understand a word you said. \$\endgroup\$

Edenia
– Edenia

2015年04月12日 04:57:30 +00:00
Commented Apr 12, 2015 at 4:57

Add a comment |

Stack Exchange Network

Function to rotate bitmap image

2 Answers 2

Code compilation issues

Code style/formatting

Bugs in the code

Speed issues covered up by the compiler

Speed issues not covered by the compiler

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Function to rotate bitmap image

2 Answers 2

Code compilation issues

Code style/formatting

Bugs in the code

Speed issues covered up by the compiler

Speed issues not covered by the compiler

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions