Presenting you an implementation of the "shear and transpose" method of rotating a bitmap image. It works.. but it works slow, which is fully understandable. It is understandable due to the evident lack of optimization. Notwithstanding it doesn't interact with the VGA or any library for that matter.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
/* bmp offsets */
#define OFFSET_OF_ID (0x0)
#define OFFSET_OF_SIZE (0x2)
#define OFFSET_OF_PIXELS (0xA)
#define OFFSET_OF_NDIB (0xE)
#define OFFSET_OF_WIDTH (0x12)
#define OFFSET_OF_HEIGHT (0x16)
#define OFFSET_OF_BPP (0x1C)
#define OFFSET_OF_NRAW (0x22)
/* bmp is especially dedicated for windows. defining windows-specific types: */
typedef unsigned char byte, *buffer;
typedef unsigned short word;
typedef unsigned long dword;
typedef unsigned long long ddword;
typedef unsigned char pixel[3];
/* bitmapS[nBitmap][offset] */
buffer *bitmapS;
/* Helper function */
ddword bmp_find_xy (dword width, dword xp, dword yp);
buffer
bmp_rotate
(dword nBitmap, float angle)
{
const dword src_width = *( (dword*)&bitmapS[nBitmap][OFFSET_OF_WIDTH]);
const dword src_height = *( (dword*)&bitmapS[nBitmap][OFFSET_OF_HEIGHT]);
const dword src_nraw = *( (dword*)&bitmapS[nBitmap][OFFSET_OF_NRAW]);
const dword src_pixels = *( (dword*)&bitmapS[nBitmap][OFFSET_OF_PIXELS]);
buffer dest = malloc( src_pixels + src_nraw );
float midX, midY;
float deltaX, deltaY;
dword rotX, rotY;
dword i, j;
midX = src_width / 2.0f;
midY = src_height / 2.0f;
memcpy(dest, bitmapS[nBitmap], src_pixels);
for(i = 0; i < src_width; i++)
for(j = 0; j < src_height; j++)
{
deltaX = i - midX;
deltaY = j - midY;
rotX = (dword)(midX + deltaX * sin(angle) + deltaY * cos(angle));
rotY = (dword)(midY + deltaX * cos(angle) - deltaY * sin(angle));
if(rotX >= 0 && rotX < src_width && rotY >= 0 && rotY < src_height)
{
ddword dest_offset = bmp_find_xy(src_width, i, j);
ddword src_offset = bmp_find_xy(src_width, rotX, rotY);
deltaX = i + midX;
deltaY = j + midY;
memcpy(&dest[dest_offset], &bitmapS[nBitmap][src_offset], sizeof(pixel);
}
}
return dest;
}
/* Definition of helper function */
ddword
bmp_find_xy
(dword width, dword xp, dword yp)
{
dword w = width;
const dword channels = sizeof(pixel);
const dword bpp = 8;
const dword single = (channels * bpp) / 7;
const dword offset = sizeof(BMP) + sizeof(DIB);
dword rowsize = w * single;
dword pixAddress;
if (rowsize % 4 != 0) rowsize += 4 - (rowsize % 4);
pixAddress = offset + yp * rowsize + xp * single;
return pixAddress;
}
2 Answers 2
Code compilation issues
The code didn't compile cleanly for me.
There was a missing right paren on this line:
memcpy(&dest[dest_offset], &bitmapS[nBitmap][src_offset], sizeof(pixel);
I got a warning that
memset
was used without a prototype. I added#include <string.h>
to fix that.The code was missing the
BMP
andDIB
structures. I substituted14
forsizeof(BMP)
and40
forsizeof(DIB)
to compensate.
Code style/formatting
Keeping in mind that style and formatting are somewhat subjective, here are some issues I spotted:
I didn't like the use of a nested for loop at the same indentation. It confused me a little bit:
for(i = 0; i < src_width; i++) for(j = 0; j < src_height; j++)
I didn't like the "Windows-ization" of the types. Personally, I don't like seeing
ddword
,dword
, andword
because I really don't have any idea what those are. I prefer to just#include <stdint.h>
and useuint64_t
,uint32_t
, anduint15_t
. Plus, the stdint types are safer to use. Yourdword
type, for example, is presumably supposed to be 32 bits wide, but you typedef'd it tounsigned long
. On a 64-bit machine,unsigned long
is likely to be 64 bits wide.Currently,
bmp_rotate()
takes an index as an argument, and then indexes into a global array of bitmaps. The code would be more reusable if the function took one bitmap as an argument, like this:buffer bmp_rotate(const buffer bitmap, float angle);
Then it could be used from anywhere without being entangled with the global variable.
There is dead code in the inner loop. This code doesn't do anything:
deltaX = i + midX; deltaY = j + midY;
Bugs in the code
The rotation is wrong. For example, if you pass in 0 for the angle, you should wind up with the same bitmap. Instead, you get a bitmap that is rotated 90 degrees to the right and mirror imaged. To fix it, I changed your rotation computation to this:
rotX = (dword)(midX + deltaX * cos(angle) + deltaY * sin(angle)); rotY = (dword)(midY - deltaX * sin(angle) + deltaY * cos(angle));
There are uninitialized pixels in the rotated bitmap. The destination bitmap is created like this:
buffer dest = malloc( src_pixels + src_nraw ); memcpy(dest, bitmapS[nBitmap], src_pixels);
This allocates a bitmap of the same size of the original and copies the header information from the original to the new bitmap. However, the pixels in the new bitmap are all uninitialized. Later, when doing the rotation, not all pixels in the new bitmap are set (because some are not mapped to any original pixel). So those pixels end up with a random value. You can fix this by simply using
calloc()
to allocate the new bitmap.The offset of the pixels may not be right. Currently, the offset from the start of the bmp data to the first pixel is calculated like this:
const dword offset = sizeof(BMP) + sizeof(DIB);
I'm not an expert on bmp files, but I believe that the
DIB
part is a variable length part. So to get the right offset, you should be using the value you callsrc_pixels
as your offset instead.
Speed issues covered up by the compiler
There are a couple of things that I spotted that you could have done better speedwise, but that the compiler (gcc -O4
) seemed to fix for you.
Instead of using
sin(angle)
andcos(angle)
in your inner loop, you could compute those once at the start of the function, and then use the precomputed versions later on:float sin_angle = sin(angle); float cos_angle = cos(angle); ... rotX = (dword)(midX + deltaX * cos_angle + deltaY * sin_angle); rotY = (dword)(midY - deltaX * sin_angle + deltaY * cos_angle);
Apparently, gcc is smart enough to know that
sin()
andcos()
are pure functions, so it was able to optimize away the calls in the inner loop. Still, it's good practice to do this kind of optimization yourself. If instead ofsin()
andcos()
you were calling your own function, then the compiler might not know the optimization was possible.In
bmp_find_xy()
, all of the lines up to the computation ofpixAddress
are the same for a given sized bitmap. So you could do an optimization where you computeoffset
,rowsize
, andsingle
just once, and then use those precomputed values in the inner loop. Just as with #1, gcc was smart enough to do all that already.
Speed issues not covered by the compiler
The way that you order your nested loop, on every iteration, you move from one destination pixel to the one below it. If you swapped the two loops, you would instead move from one pixel to the one to its right. This is a better ordering, because it is more cache friendly. In fact, when I swapped the two loops, I was able to get a 5% speedup on a 45 degree rotation.
Oftentimes with loops, you can often simplify the calculation within an inner loop if you compute an initial value and then increment the value instead of recomputing the whole thing. For example, let's look at a sample loop:
for (i=0;i<n;i++) { offset = i * ROW_WIDTH + startOffset; do_something(offset); }
You can simplify this to:
offset = 0 * ROW_WIDTH + startOffset; for (i=0;i<n;i++) { do_something(offset); offset += ROW_WIDTH; }
Using this technique, I simplified your main loop to this:
for(j = 0; j < src_height; j++) { ddword dest_offset = src_pixels + j * rowsize; float deltaY = j - midY; float deltaX = 0 - midX; float x = midX + deltaX * cos_angle + deltaY * sin_angle; float y = midY - deltaX * sin_angle + deltaY * cos_angle; for(i = 0; i < src_width; i++) { dword rotX = x; dword rotY = y; if(rotX >= 0 && rotX < src_width && rotY >= 0 && rotY < src_height) { ddword src_offset = src_pixels + rotY * rowsize + rotX * single; memcpy(&dest[dest_offset], &bitmap[src_offset], sizeof(pixel)); } x += cos_angle; y -= sin_angle; dest_offset += single; } }
Notice how the inner loop has replaced three complicated expressions with three simple additions. This change resulted in a 140% speedup (it was 2.4x faster)!
Memcpy isn't so great for only 3 bytes. I replaced your call to
memcpy
with some inlined code:// Old code: // memcpy(&dest[dest_offset], &bitmap[src_offset], sizeof(pixel)); // New code: dest[dest_offset+0] = bitmap[src_offset+0]; dest[dest_offset+1] = bitmap[src_offset+1]; dest[dest_offset+2] = bitmap[src_offset+2];
This resulted in a 9% speedup.
-
\$\begingroup\$ Thanks.. It truly increased FPS! Using your simplified loop the image being rotated is getting top/bottom cropped in some angles and I need to try understanding why. \$\endgroup\$Edenia– Edenia2015年04月10日 20:54:18 +00:00Commented Apr 10, 2015 at 20:54
-
\$\begingroup\$ Fixed. Your review is more than perfect. \$\endgroup\$Edenia– Edenia2015年04月10日 20:59:54 +00:00Commented Apr 10, 2015 at 20:59
-
\$\begingroup\$ @Edenia I'm not sure about the cropping but I noticed that the conversion of rotX and rotY are being done by rounding down always. You could round to nearest by adding
+ 0.5f
to wherex
andy
are initialized. Like this:float x = midX + deltaX * cos_angle + deltaY * sin_angle + 0.5f;
. I'm not sure if that will help anything though. \$\endgroup\$JS1– JS12015年04月10日 21:05:38 +00:00Commented Apr 10, 2015 at 21:05 -
\$\begingroup\$ No I fixed it. It was nothing. Also yes I also encountered the near rounding problem. Also it may be a bit strange but
memcpy
works faster for me for my c89. \$\endgroup\$Edenia– Edenia2015年04月10日 21:12:36 +00:00Commented Apr 10, 2015 at 21:12 -
\$\begingroup\$ By the way did you also spotted that by rotating, the source image is copied in slightly different locations I am not sure if fixing this will be easy, because I mostly understand the ways invented by me. Also clueless. \$\endgroup\$Edenia– Edenia2015年04月10日 21:20:50 +00:00Commented Apr 10, 2015 at 21:20
You chose to iterate source pixels. This is a standard mistake leading to loss of quality. The problem is twofold: some source pixel may land on the same destination pixel, and some destination pixels may have no preimage.
Instead you need to iterate over destination pixels. Find the preimage (floating point), and interpolate the known pixels around it.
-
\$\begingroup\$ It will be a bit hard to set up code working with iteration through destination pixels. This will also change dimensions and I have bigger problems atm though, so I can't say a thing. \$\endgroup\$Edenia– Edenia2015年04月11日 06:23:27 +00:00Commented Apr 11, 2015 at 6:23
-
\$\begingroup\$ Actually it is iterating destination pixels, read it again. \$\endgroup\$JS1– JS12015年04月11日 06:55:10 +00:00Commented Apr 11, 2015 at 6:55
-
\$\begingroup\$ I read but I don't trust my eyes.. .. and I don't know what you mean, I don't understand a word you said. \$\endgroup\$Edenia– Edenia2015年04月12日 04:57:30 +00:00Commented Apr 12, 2015 at 4:57