Expand every bit into a byte

Question 1

I have the following code:

function TSliverHelper.SlowNorth: TSlice;
var
 i: integer;
begin
 // Add pixels 0,1,2
 // This means expanding every bit into a byte
 // Or rather every byte into an int64;
 for i:= 0 to 7 do begin
 Result.Data8[i]:= TSuperSlice.Lookup012[Self.bytes[i]];
 end;
end;

This uses a straight forward lookup table, but obviously LUT's are slow and clobber the cache. This takes about 2860 millisecs for 100.000.000 items.

The following approach is a bit faster (1797 MS, or 37% faster):

function TSliverHelper.North: TSlice;
const
 SliverToSliceMask: array[0..7] of byte = (01,ドル02,ドル04,ドル08,ドル10,ドル20,ドル40,ドル80ドル);
asm
 //RCX = @Self (a pointer to an Int64)
 //RDX = @Result (a pointer to an array[0..63] of byte)
 movq xmm0,[rcx] //Get the sliver
 mov r9,8040201008040201ドル
 movq xmm15,r9 //[rip+SliverToSliceMask] //Get the mask
 movlhps xmm15,xmm15 //extend it
 mov r8,0101010101010101ドル //Shuffle mask
 movq xmm14,r8 //00 00 00 00 00 00 00 00 01 01 01 01 01 01 01 01
 pslldq xmm14,8 //01 01 01 01 01 01 01 01 00 00 00 00 00 00 00 00
 movdqa xmm1,xmm0 //make a copy of the sliver
 //bytes 0,1
 pshufb xmm1,xmm14 //copy the first two bytes across
 pand xmm1,xmm15 //Mask off the relevant bits
 pcmpeqb xmm1,xmm15 //Expand a bit into a byte
 movdqu [rdx],xmm1
 //bytes 2,3
 psrldq xmm0,2 //shift in the next two bytes
 movdqa xmm2,xmm0
 pshufb xmm2,xmm14 //copy the next two bytes across
 pand xmm2,xmm15 //Mask off the relevant bits
 pcmpeqb xmm2,xmm15 //Expand a bit into a byte
 movdqu [rdx+16],xmm2
 //bytes 4,5
 psrldq xmm0,2 //shift in the next two bytes
 movdqa xmm3,xmm0
 pshufb xmm3,xmm14 //copy the next two bytes across
 pand xmm3,xmm15 //Mask off the relevant bits
 pcmpeqb xmm3,xmm15 //Expand a bit into a byte
 movdqu [rdx+32],xmm3
 //bytes 6,7
 psrldq xmm0,2 //shift in the next two bytes
 movdqa xmm4,xmm0
 pshufb xmm4,xmm14 //copy the final two bytes across
 pand xmm4,xmm15 //Mask off the relevant bits
 pcmpeqb xmm4,xmm15 //Expand a bit into a byte
 //Store the data
 movdqu [rdx+48],xmm4
end;

However, that is a lot of code. I'm hoping there's a way to do with less processing that's going to work faster. The way the code works (in prose) is simple.
First we clone the input byte 8 times. Next the bit is masked off using the 01,02,04... mask and an AND operation. Finally this randomish bit is expanded into a byte using the compare-equal-to-mask (pcmpeqb).

The opposite operation is a simple PMSKMOVB.

I can use AVX1 code, but not AVX2.

Question 2

Don't you have any other options than Pascal (if I'm right)?

Question 3

It's in assembly. The Pascal code is just a wrapper, any language will do.

Question 4

Can't try it for now, but I assume you can reach the ASM's performance, shorter, with plain C.

Question 5

Use a multiplication to perform several shifts in a single instruction.

Trim the input to seven bits to avoid overlap in the second step.
Shift by 0, 7, 14, 21, 28, 35, 42 bits and aggregate the results in a 64-bit integer.
Keep only bits 0, 8, 16, 24, 32, 40, 48.
Handle the 8th bit of the input separately. Shift by 49, then add it to the others.

Example code in C#

ulong Expand(byte b)
{
 ulong shift = 0x0000040810204081ul; // bits set: 0, 7, 14, 21, 28, 35, 42
 ulong mask = 0x0001010101010101ul; // bits set: 0, 8, 16, 24, 32, 40, 48
 return (ulong)(b & 127) * shift & mask | (ulong)(b & 128) << 49;
}

Question 6

Nice trick for byte to qword.

Rainer P. Rainer P. 3,08512 silver badges18 bronze badges · Answer 1 · 2018-11-12 13:21:30Z

Use a multiplication to perform several shifts in a single instruction.

Trim the input to seven bits to avoid overlap in the second step.
Shift by 0, 7, 14, 21, 28, 35, 42 bits and aggregate the results in a 64-bit integer.
Keep only bits 0, 8, 16, 24, 32, 40, 48.
Handle the 8th bit of the input separately. Shift by 49, then add it to the others.

Example code in C#

ulong Expand(byte b)
{
 ulong shift = 0x0000040810204081ul; // bits set: 0, 7, 14, 21, 28, 35, 42
 ulong mask = 0x0001010101010101ul; // bits set: 0, 8, 16, 24, 32, 40, 48
 return (ulong)(b & 127) * shift & mask | (ulong)(b & 128) << 49;
}

\$\begingroup\$ Nice trick for byte to qword. \$\endgroup\$

W. Chang
– W. Chang

2018年11月27日 03:32:05 +00:00
Commented Nov 27, 2018 at 3:32

Stack Exchange Network

Expand every bit into a byte

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Expand every bit into a byte

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions