|
|
|
Created:
13 years, 10 months ago by guanqun Modified:
13 years, 10 months ago CC:
skia-review_googlegroups.com Base URL:
http://skia.googlecode.com/svn/trunk Visibility:
Public. |
make S4444_opaque_D32_nofilter_DX faster with SSE2 support
On an Atom based netbook, before this patch:
[skia.git]$ ./out/Release/bench -config 8888 -repeat 100 -match repeatTile_4444
skia bench: alpha=0xFF antialias=1 filter=0 rotate=0 scale=0 clip=0 dither=default strokeWidth=none scalar=float system=UNIX
running bench [640 480] repeatTile_4444 8888: cmsecs = 113.83
With this patch:
[skia.git]$ ./out/Release/bench -config 8888 -repeat 100 -match repeatTile_4444
skia bench: alpha=0xFF antialias=1 filter=0 rotate=0 scale=0 clip=0 dither=default strokeWidth=none scalar=float system=UNIX
running bench [640 480] repeatTile_4444 8888: cmsecs = 82.71
So this is about 27% performance boost.
I've also tested with gm tool and the result stays intact.
BUG=
TEST=
Patch Set 1 #
Total comments: 4
Patch Set 2 : comment fixes #
Total comments: 7
Patch Set 3 : add the blank line #
Total messages: 7
|
guanqun
Please review. Thanks!
|
13 years, 10 months ago (2012年03月08日 07:36:04 UTC) #1 | |||||||||||||||||||||||||||||||||||
Please review. Thanks!
We'll take a look. Is there an actual end-user use case for this? We weren't aware of anybody rendering 4444 with Skia, and had already started planning to deprecate that code path.
http://codereview.appspot.com/5790043/diff/1/src/opts/SkBitmapProcState_opts_... File src/opts/SkBitmapProcState_opts_SSE2.cpp (right): http://codereview.appspot.com/5790043/diff/1/src/opts/SkBitmapProcState_opts_... src/opts/SkBitmapProcState_opts_SSE2.cpp:643: * So this optimization is specific for such config. Unfortunately, the grammar isn't clear enough here for me to understand this comment. http://codereview.appspot.com/5790043/diff/1/src/opts/SkBitmapProcState_opts_... src/opts/SkBitmapProcState_opts_SSE2.cpp:684: void S4444_opaque_D32_nofilter_DX_ARGB_SSE2(const SkBitmapProcState& s, Nit: Admittedly the naming shorthand for our procs is nowhere defined, but D32 and ARGB is redundant.
I happened to notice this slowness when I was running bench on the netbook. So I started optimizing it. It's good to hear this would be deprecated. Then maybe I'll try to port this optimization to other configurations such as S32 and see if it has some improvements. And the comment is fixed to make it clear. Thanks! http://codereview.appspot.com/5790043/diff/1/src/opts/SkBitmapProcState_opts_... File src/opts/SkBitmapProcState_opts_SSE2.cpp (right): http://codereview.appspot.com/5790043/diff/1/src/opts/SkBitmapProcState_opts_... src/opts/SkBitmapProcState_opts_SSE2.cpp:643: * So this optimization is specific for such config. On 2012年03月08日 19:40:18, TomH wrote: > Unfortunately, the grammar isn't clear enough here for me to understand this > comment. Sorry I make you confused... What I'm trying to say is this optimization only works when: 1) the machine is little endian. 2) SK_A32_SHIFT is 24, SK_R32_SHIFT is 16, SK_G32_SHIFT is 8, SK_B32_SHIFT is 0. http://codereview.appspot.com/5790043/diff/1/src/opts/SkBitmapProcState_opts_... src/opts/SkBitmapProcState_opts_SSE2.cpp:684: void S4444_opaque_D32_nofilter_DX_ARGB_SSE2(const SkBitmapProcState& s, On 2012年03月08日 19:40:18, TomH wrote: > Nit: Admittedly the naming shorthand for our procs is nowhere defined, but D32 > and ARGB is redundant. I'll rename this to S444_opaque_nofilter_DX_SSE2.
http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_op... File src/opts/SkBitmapProcState_opts_SSE2.cpp (right): http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_op... src/opts/SkBitmapProcState_opts_SSE2.cpp:637: namespace { why do we need the namespace http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_op... src/opts/SkBitmapProcState_opts_SSE2.cpp:658: __m128i R = _mm_slli_epi32(_mm_and_si128(n, maskR), 4); can we use this to cover the ARGB and ABGR config on IA? #if SK_R32_SHIFT == 24 __m128i R = _mm_slli_epi32(_mm_and_si128(n, maskR), 4); #else __m128i R = _mm_srli_epi32(_mm_and_si128(n, maskR), 12); http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_op... src/opts/SkBitmapProcState_opts_SSE2.cpp:732: } a new blank line needed
http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_op... File src/opts/SkBitmapProcState_opts_SSE2.cpp (right): http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_op... src/opts/SkBitmapProcState_opts_SSE2.cpp:637: namespace { On 2012年03月09日 04:38:37, Jin Yang wrote: > why do we need the namespace I want to put all the auxiliary functions in the anonymous namespace just as the way it's done in SkBitmapProcState_opts_SSE3.cpp. http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_op... src/opts/SkBitmapProcState_opts_SSE2.cpp:658: __m128i R = _mm_slli_epi32(_mm_and_si128(n, maskR), 4); On 2012年03月09日 04:38:37, Jin Yang wrote: > can we use this to cover the ARGB and ABGR config on IA? > #if SK_R32_SHIFT == 24 > __m128i R = _mm_slli_epi32(_mm_and_si128(n, maskR), 4); > #else > __m128i R = _mm_srli_epi32(_mm_and_si128(n, maskR), 12); Hmm, if it's not 24, then it's POSSIBLE to be set to other values (though in theory we might only have ARGB and ABGR configs), therefore the code has to be coded like below: #if SK_R32_SHIFT == 16 __m128i R = _mm_slli_epi32(_mm_and_si128(n, maskR), 4); #elif SK_R32_SHIFT == 0 __m128i R = _mm_srli_epi32(_mm_and_si128(n, maskR), 12); #else // general stuff or just err out? #endif The above code looks like mess, and I would rather focus one optimization of one specific config. http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_op... src/opts/SkBitmapProcState_opts_SSE2.cpp:732: } On 2012年03月09日 04:38:37, Jin Yang wrote: > a new blank line needed Done.
http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_op... File src/opts/SkBitmapProcState_opts_SSE2.cpp (right): http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_op... src/opts/SkBitmapProcState_opts_SSE2.cpp:637: namespace { On 2012年03月09日 05:05:07, guanqun wrote: > On 2012年03月09日 04:38:37, Jin Yang wrote: > > why do we need the namespace > > I want to put all the auxiliary functions in the anonymous namespace just as the > way it's done in SkBitmapProcState_opts_SSE3.cpp. that file used the namespace as a tricky to convince gcc to inline the template function. I think you do not need the tricky there.