Keyboard Shortcuts

File
u :up to issue
m :publish + mail comments
M :edit review message
j / k :jump to file after / before current file
J / K :jump to next file with a comment after / before current file
Side-by-side diff
i :toggle intra-line diffs
e :expand all comments
c :collapse all comments
s :toggle showing all comments
n / p :next / previous diff chunk or comment
N / P :next / previous comment
<Up> / <Down> :next / previous line
<Enter> :respond to / edit current comment
d :mark current comment as done
Issue
u :up to list of issues
m :publish + mail comments
j / k :jump to patch after / before current patch
o / <Enter> :open current patch in side-by-side view
i :open current patch in unified diff view
Issue List
j / k :jump to issue after / before current issue
o / <Enter> :open current issue
# : close issue
Comment/message editing
<Ctrl> + s or <Ctrl> + Enter :save comment
<Esc> :cancel edit
Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(74)
Issues Repositories Search
Open Issues | Closed Issues | All Issues | Sign in with your Google Account to create issues and add comments

Issue 5790043: make S4444_opaque_D32_nofilter_DX faster with SSE2 support

Can't Edit
Can't Publish+Mail
Start Review
Created:
13 years, 10 months ago by guanqun
Modified:
13 years, 10 months ago
Reviewers:
Jin A.Yang, TomH
CC:
skia-review_googlegroups.com
Base URL:
http://skia.googlecode.com/svn/trunk
Visibility:
Public.
make S4444_opaque_D32_nofilter_DX faster with SSE2 support On an Atom based netbook, before this patch: [skia.git]$ ./out/Release/bench -config 8888 -repeat 100 -match repeatTile_4444 skia bench: alpha=0xFF antialias=1 filter=0 rotate=0 scale=0 clip=0 dither=default strokeWidth=none scalar=float system=UNIX running bench [640 480] repeatTile_4444 8888: cmsecs = 113.83 With this patch: [skia.git]$ ./out/Release/bench -config 8888 -repeat 100 -match repeatTile_4444 skia bench: alpha=0xFF antialias=1 filter=0 rotate=0 scale=0 clip=0 dither=default strokeWidth=none scalar=float system=UNIX running bench [640 480] repeatTile_4444 8888: cmsecs = 82.71 So this is about 27% performance boost. I've also tested with gm tool and the result stays intact. BUG= TEST=

Patch Set 1 #

Total comments: 4

Patch Set 2 : comment fixes #

Total comments: 7

Patch Set 3 : add the blank line #

Created: 13 years, 10 months ago
Download [raw] [tar.bz2]
Unified diffs Side-by-side diffs Delta from patch set Stats (+117 lines, -0 lines) Patch
src/core/SkBitmapProcState.h View 1 1 chunk +2 lines, -0 lines 0 comments Download
src/opts/SkBitmapProcState_opts_SSE2.h View 1 1 chunk +3 lines, -0 lines 0 comments Download
src/opts/SkBitmapProcState_opts_SSE2.cpp View 1 2 2 chunks +106 lines, -0 lines 0 comments Download
src/opts/opts_check_SSE2.cpp View 1 1 chunk +6 lines, -0 lines 0 comments Download
Total messages: 7
|
guanqun
Please review. Thanks!
13 years, 10 months ago (2012年03月08日 07:36:04 UTC) #1
Please review. Thanks!
Sign in to reply to this message.
TomH
We'll take a look. Is there an actual end-user use case for this? We weren't ...
13 years, 10 months ago (2012年03月08日 14:01:02 UTC) #2
We'll take a look. Is there an actual end-user use case for this? We weren't
aware of anybody rendering 4444 with Skia, and had already started planning to
deprecate that code path.
Sign in to reply to this message.
TomH
http://codereview.appspot.com/5790043/diff/1/src/opts/SkBitmapProcState_opts_SSE2.cpp File src/opts/SkBitmapProcState_opts_SSE2.cpp (right): http://codereview.appspot.com/5790043/diff/1/src/opts/SkBitmapProcState_opts_SSE2.cpp#newcode643 src/opts/SkBitmapProcState_opts_SSE2.cpp:643: * So this optimization is specific for such config. ...
13 years, 10 months ago (2012年03月08日 19:40:18 UTC) #3
http://codereview.appspot.com/5790043/diff/1/src/opts/SkBitmapProcState_opts_...
File src/opts/SkBitmapProcState_opts_SSE2.cpp (right):
http://codereview.appspot.com/5790043/diff/1/src/opts/SkBitmapProcState_opts_...
src/opts/SkBitmapProcState_opts_SSE2.cpp:643: * So this optimization is specific
for such config.
Unfortunately, the grammar isn't clear enough here for me to understand this
comment.
http://codereview.appspot.com/5790043/diff/1/src/opts/SkBitmapProcState_opts_...
src/opts/SkBitmapProcState_opts_SSE2.cpp:684: void
S4444_opaque_D32_nofilter_DX_ARGB_SSE2(const SkBitmapProcState& s,
Nit: Admittedly the naming shorthand for our procs is nowhere defined, but D32
and ARGB is redundant.
Sign in to reply to this message.
guanqun
I happened to notice this slowness when I was running bench on the netbook. So ...
13 years, 10 months ago (2012年03月09日 01:33:34 UTC) #4
I happened to notice this slowness when I was running bench on the netbook. So I
started optimizing it.
It's good to hear this would be deprecated. Then maybe I'll try to port this
optimization to other configurations such as S32 and see if it has some
improvements.
And the comment is fixed to make it clear. Thanks!
http://codereview.appspot.com/5790043/diff/1/src/opts/SkBitmapProcState_opts_...
File src/opts/SkBitmapProcState_opts_SSE2.cpp (right):
http://codereview.appspot.com/5790043/diff/1/src/opts/SkBitmapProcState_opts_...
src/opts/SkBitmapProcState_opts_SSE2.cpp:643: * So this optimization is specific
for such config.
On 2012年03月08日 19:40:18, TomH wrote:
> Unfortunately, the grammar isn't clear enough here for me to understand this
> comment.
Sorry I make you confused... What I'm trying to say is this optimization only
works when: 1) the machine is little endian. 2) SK_A32_SHIFT is 24, SK_R32_SHIFT
is 16, SK_G32_SHIFT is 8, SK_B32_SHIFT is 0.
http://codereview.appspot.com/5790043/diff/1/src/opts/SkBitmapProcState_opts_...
src/opts/SkBitmapProcState_opts_SSE2.cpp:684: void
S4444_opaque_D32_nofilter_DX_ARGB_SSE2(const SkBitmapProcState& s,
On 2012年03月08日 19:40:18, TomH wrote:
> Nit: Admittedly the naming shorthand for our procs is nowhere defined, but D32
> and ARGB is redundant.
I'll rename this to S444_opaque_nofilter_DX_SSE2.
Sign in to reply to this message.
Jin A.Yang
http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_opts_SSE2.cpp File src/opts/SkBitmapProcState_opts_SSE2.cpp (right): http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_opts_SSE2.cpp#newcode637 src/opts/SkBitmapProcState_opts_SSE2.cpp:637: namespace { why do we need the namespace http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_opts_SSE2.cpp#newcode658 ...
13 years, 10 months ago (2012年03月09日 04:38:36 UTC) #5
http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_op...
File src/opts/SkBitmapProcState_opts_SSE2.cpp (right):
http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_op...
src/opts/SkBitmapProcState_opts_SSE2.cpp:637: namespace {
why do we need the namespace
http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_op...
src/opts/SkBitmapProcState_opts_SSE2.cpp:658: __m128i R =
_mm_slli_epi32(_mm_and_si128(n, maskR), 4);
can we use this to cover the ARGB and ABGR config on IA?
#if SK_R32_SHIFT == 24
__m128i R = _mm_slli_epi32(_mm_and_si128(n, maskR), 4);
#else
__m128i R = _mm_srli_epi32(_mm_and_si128(n, maskR), 12);
http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_op...
src/opts/SkBitmapProcState_opts_SSE2.cpp:732: }
a new blank line needed
Sign in to reply to this message.
guanqun
http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_opts_SSE2.cpp File src/opts/SkBitmapProcState_opts_SSE2.cpp (right): http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_opts_SSE2.cpp#newcode637 src/opts/SkBitmapProcState_opts_SSE2.cpp:637: namespace { On 2012年03月09日 04:38:37, Jin Yang wrote: > ...
13 years, 10 months ago (2012年03月09日 05:05:06 UTC) #6
http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_op...
File src/opts/SkBitmapProcState_opts_SSE2.cpp (right):
http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_op...
src/opts/SkBitmapProcState_opts_SSE2.cpp:637: namespace {
On 2012年03月09日 04:38:37, Jin Yang wrote:
> why do we need the namespace
I want to put all the auxiliary functions in the anonymous namespace just as the
way it's done in SkBitmapProcState_opts_SSE3.cpp.
http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_op...
src/opts/SkBitmapProcState_opts_SSE2.cpp:658: __m128i R =
_mm_slli_epi32(_mm_and_si128(n, maskR), 4);
On 2012年03月09日 04:38:37, Jin Yang wrote:
> can we use this to cover the ARGB and ABGR config on IA?
> #if SK_R32_SHIFT == 24
> __m128i R = _mm_slli_epi32(_mm_and_si128(n, maskR), 4);
> #else
> __m128i R = _mm_srli_epi32(_mm_and_si128(n, maskR), 12);
Hmm, if it's not 24, then it's POSSIBLE to be set to other values (though in
theory we might only have ARGB and ABGR configs), therefore the code has to be
coded like below:
#if SK_R32_SHIFT == 16
__m128i R = _mm_slli_epi32(_mm_and_si128(n, maskR), 4);
#elif SK_R32_SHIFT == 0
__m128i R = _mm_srli_epi32(_mm_and_si128(n, maskR), 12);
#else
 // general stuff or just err out?
#endif
The above code looks like mess, and I would rather focus one optimization of one
specific config.
http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_op...
src/opts/SkBitmapProcState_opts_SSE2.cpp:732: }
On 2012年03月09日 04:38:37, Jin Yang wrote:
> a new blank line needed
Done.
Sign in to reply to this message.
Jin A.Yang
http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_opts_SSE2.cpp File src/opts/SkBitmapProcState_opts_SSE2.cpp (right): http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_opts_SSE2.cpp#newcode637 src/opts/SkBitmapProcState_opts_SSE2.cpp:637: namespace { On 2012年03月09日 05:05:07, guanqun wrote: > On ...
13 years, 10 months ago (2012年03月09日 08:37:32 UTC) #7
http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_op...
File src/opts/SkBitmapProcState_opts_SSE2.cpp (right):
http://codereview.appspot.com/5790043/diff/6001/src/opts/SkBitmapProcState_op...
src/opts/SkBitmapProcState_opts_SSE2.cpp:637: namespace {
On 2012年03月09日 05:05:07, guanqun wrote:
> On 2012年03月09日 04:38:37, Jin Yang wrote:
> > why do we need the namespace
> 
> I want to put all the auxiliary functions in the anonymous namespace just as
the
> way it's done in SkBitmapProcState_opts_SSE3.cpp.
that file used the namespace as a tricky to convince gcc to inline the template
function. I think you do not need the tricky there.
Sign in to reply to this message.
|
Powered by Google App Engine
This is Rietveld f62528b

AltStyle によって変換されたページ (->オリジナル) /