Return to Answer

Commonmark migration

edited Jun 10, 2020 at 13:24

I can see several ways to improve the speed of that loop. While I'm all for using the GPU, there are a bunch of easier more practical things to do before going that far.

`memcpy`

#memcpy OnOn most platforms, the C standard library memcpy() function is highly optimized. It will be much faster than copying a single byte at a time. If you have the space for it, it might be faster to make a second memory array and use memcpy to copy each line from the original image into its inverted Y position in the new image. Then run through the new image and invert the colors. Profiling will tell you if it's faster or not.

Work on more at once

#Work on more at once HeckHeck, even using a 32-bit or 64-bit data type (careful at the ends!) for doing your copies would be a win. You get 4 to 8x as many copies done in a single instruction.

You can also work on more than one byte at a time when inverting the colors by using SIMD instructions. (On Intel architectures this would be SSE or AVX instructions.)

Parallel processing

#Parallel processing DependingDepending on the size of your images, it might make sense to write some multithreaded code to process many sections of the image at a time. For example, if you have 4 cores in your machine, you could have each one work on 1/4th of the image. (This is obviously easier for the color inversion than the Y inversion.)

Don't Do Any Work

#Don't Do Any Work TheThe fastest way to get work done is to already have it done. Why are your images inverted (both in color and in space) in the first place? Is there any way you could create them correctly so you don't have to flip them and make a negative? If so, that's probably the right thing to do, but I'm sure there are other considerations.

I can see several ways to improve the speed of that loop. While I'm all for using the GPU, there are a bunch of easier more practical things to do before going that far.

#memcpy On most platforms, the C standard library memcpy() function is highly optimized. It will be much faster than copying a single byte at a time. If you have the space for it, it might be faster to make a second memory array and use memcpy to copy each line from the original image into its inverted Y position in the new image. Then run through the new image and invert the colors. Profiling will tell you if it's faster or not.

#Work on more at once Heck, even using a 32-bit or 64-bit data type (careful at the ends!) for doing your copies would be a win. You get 4 to 8x as many copies done in a single instruction.

You can also work on more than one byte at a time when inverting the colors by using SIMD instructions. (On Intel architectures this would be SSE or AVX instructions.)

#Parallel processing Depending on the size of your images, it might make sense to write some multithreaded code to process many sections of the image at a time. For example, if you have 4 cores in your machine, you could have each one work on 1/4th of the image. (This is obviously easier for the color inversion than the Y inversion.)

#Don't Do Any Work The fastest way to get work done is to already have it done. Why are your images inverted (both in color and in space) in the first place? Is there any way you could create them correctly so you don't have to flip them and make a negative? If so, that's probably the right thing to do, but I'm sure there are other considerations.

I can see several ways to improve the speed of that loop. While I'm all for using the GPU, there are a bunch of easier more practical things to do before going that far.

`memcpy`

On most platforms, the C standard library memcpy() function is highly optimized. It will be much faster than copying a single byte at a time. If you have the space for it, it might be faster to make a second memory array and use memcpy to copy each line from the original image into its inverted Y position in the new image. Then run through the new image and invert the colors. Profiling will tell you if it's faster or not.

Work on more at once

Heck, even using a 32-bit or 64-bit data type (careful at the ends!) for doing your copies would be a win. You get 4 to 8x as many copies done in a single instruction.

You can also work on more than one byte at a time when inverting the colors by using SIMD instructions. (On Intel architectures this would be SSE or AVX instructions.)

Parallel processing

Depending on the size of your images, it might make sense to write some multithreaded code to process many sections of the image at a time. For example, if you have 4 cores in your machine, you could have each one work on 1/4th of the image. (This is obviously easier for the color inversion than the Y inversion.)

Don't Do Any Work

The fastest way to get work done is to already have it done. Why are your images inverted (both in color and in space) in the first place? Is there any way you could create them correctly so you don't have to flip them and make a negative? If so, that's probably the right thing to do, but I'm sure there are other considerations.

Source Link

answered Feb 21, 2017 at 5:15

user1118321

answered Feb 21, 2017 at 5:15

user1118321

11.9k
1
20
46

I can see several ways to improve the speed of that loop. While I'm all for using the GPU, there are a bunch of easier more practical things to do before going that far.

#Work on more at once Heck, even using a 32-bit or 64-bit data type (careful at the ends!) for doing your copies would be a win. You get 4 to 8x as many copies done in a single instruction.

You can also work on more than one byte at a time when inverting the colors by using SIMD instructions. (On Intel architectures this would be SSE or AVX instructions.)

lang-cpp