I'm using a supplied library which uses a number of (non-sequential) outputs to latch a multiplexer. The code looks (in part) like this:
digitalWrite(_S0, (chan & 1));
digitalWrite(_S1, (chan & 3)>>1);
digitalWrite(_S2, (chan & 7)>>2);
digitalWrite(_S3, (chan & 15)>>3);
Those masks seem a little odd, but it would work. More natural to me would be:
digitalWrite(_S0, (chan & 1));
digitalWrite(_S1, (chan & 2)>>1);
digitalWrite(_S2, (chan & 4)>>2);
digitalWrite(_S3, (chan & 8)>>3);
But overall, would it be more efficient to do this?
digitalWrite(_S0, (chan & 1));
digitalWrite(_S1, (bool)(chan & 2));
digitalWrite(_S2, (bool)(chan & 4));
digitalWrite(_S3, (bool)(chan & 8));
...or is shifting already pretty efficient?
I've actually re-written the code to build a mask and use PORTx to latch it all at once, so this is really an academic question, but I'm still curious.
2 Answers 2
I tried compiling your three snippets with avr-gcc 4.9.2 at the -Os
optimization level (standard with Arduino) but without -flto
. The
results were:
The first snippet generated inefficient code: the ands and shifts were translated in assembly quite literally, the shifts were done on 16 bits even though
chan
was declareduint8_t
, and the last shift was even implemented as a loop roughly equivalent touint16_t tmp1 = chan & 15; uint8_t tmp2 = 3; while (--tmp2) tmp1 <<= 1;
The second and third snippets were translated identically and efficiently, using the
bst
(bit store) andbld
(bit load) instructions to copy the relevant bit ofchan
into the second argument of thedigitalWrite()
call.
Personally I would just write
digitalWrite(_S0, chan & 1);
digitalWrite(_S1, chan & 2);
digitalWrite(_S2, chan & 4);
digitalWrite(_S3, chan & 8);
as digitalWrite()
expects an integer as its second argument, and it
interprets it just like the (bool)
cast does. If you want to be sure
to always call digitalWrite()
with either 0 (LOW
) or 1 (HIGH
),
then use either your second of third form. The first form seems formally
equivalent to the second, but since it is not an usual C idiom, the
compiler could not catch the optimization opportunity.
-
Thanks for the analysis. I can see how the
(bool)
cast would be redundant in this case. Also very interesting that the compiler knows 'idioms'.Jim Mack– Jim Mack2017年09月07日 16:17:14 +00:00Commented Sep 7, 2017 at 16:17
If you look at the assembly code the compiler produces, you can see it compiles to the exact same thing - when using a constant.:
void setup() {
....
digitalWrite(_S1, (chan & 2)>>1);
282: 84 e0 ldi r24, 0x04 ; 4
284: 0e 94 6b 00 call 0xd6 ; 0xd6 <digitalWrite.constprop.0>
....
digitalWrite(_S1, (bool)(chan & 2));
29a: 84 e0 ldi r24, 0x04 ; 4
29c: 0e 94 6b 00 call 0xd6 ; 0xd6 <digitalWrite.constprop.0>
However, using a variable produces different results:
void loop() {
digitalWrite(_S0, (chan & 1));
2a2: c0 91 00 01 lds r28, 0x0100 ; 0x800100 <__data_start>
2a6: d0 91 01 01 lds r29, 0x0101 ; 0x800101 <__data_start+0x1>
2aa: 6c 2f mov r22, r28
2ac: 61 70 andi r22, 0x01 ; 1
2ae: 83 e0 ldi r24, 0x03 ; 3
2b0: 0e 94 76 00 call 0xec ; 0xec <digitalWrite>
digitalWrite(_S1, (chan & 2)>>1);
2b4: 6c 2f mov r22, r28
2b6: 66 95 lsr r22
2b8: 61 70 andi r22, 0x01 ; 1
2ba: 84 e0 ldi r24, 0x04 ; 4
2bc: 0e 94 76 00 call 0xec ; 0xec <digitalWrite>
digitalWrite(_S2, (chan & 4)>>2);
2c0: c2 fb bst r28, 2
2c2: 66 27 eor r22, r22
2c4: 60 f9 bld r22, 0
2c6: 85 e0 ldi r24, 0x05 ; 5
2c8: 0e 94 76 00 call 0xec ; 0xec <digitalWrite>
digitalWrite(_S3, (chan & 8)>>3);
2cc: c3 fb bst r28, 3
2ce: 66 27 eor r22, r22
2d0: 60 f9 bld r22, 0
2d2: 86 e0 ldi r24, 0x06 ; 6
2d4: 0e 94 76 00 call 0xec ; 0xec <digitalWrite>
....
digitalWrite(_S0, (chan & 1));
2a2: c0 91 00 01 lds r28, 0x0100 ; 0x800100 <__data_start>
2a6: d0 91 01 01 lds r29, 0x0101 ; 0x800101 <__data_start+0x1>
2aa: 6c 2f mov r22, r28
2ac: 61 70 andi r22, 0x01 ; 1
2ae: 83 e0 ldi r24, 0x03 ; 3
2b0: 0e 94 76 00 call 0xec ; 0xec <digitalWrite>
digitalWrite(_S1, (bool)(chan & 2));
2b4: be 01 movw r22, r28
2b6: 76 95 lsr r23
2b8: 67 95 ror r22
2ba: 61 70 andi r22, 0x01 ; 1
2bc: 84 e0 ldi r24, 0x04 ; 4
2be: 0e 94 76 00 call 0xec ; 0xec <digitalWrite>
digitalWrite(_S2, (bool)(chan & 4));
2c2: be 01 movw r22, r28
2c4: 76 95 lsr r23
2c6: 67 95 ror r22
2c8: 76 95 lsr r23
2ca: 67 95 ror r22
2cc: 61 70 andi r22, 0x01 ; 1
2ce: 85 e0 ldi r24, 0x05 ; 5
2d0: 0e 94 76 00 call 0xec ; 0xec <digitalWrite>
digitalWrite(_S3, (bool)(chan & 8));
2d4: be 01 movw r22, r28
2d6: 23 e0 ldi r18, 0x03 ; 3
2d8: 76 95 lsr r23
2da: 67 95 ror r22
2dc: 2a 95 dec r18
2de: e1 f7 brne .-8 ; 0x2d8 <main+0xc0>
2e0: 61 70 andi r22, 0x01 ; 1
2e2: 86 e0 ldi r24, 0x06 ; 6
2e4: 0e 94 76 00 call 0xec ; 0xec <digitalWrite>
-
Your test is not really significant. The compiler noticed that you were always calling
digitalWrite()
with 0 (i.e.LOW
) as its second argument. It then generated a specialized variant ofdigitalWrite()
that sets an output toLOW
and it translated your code into something likedigitalWriteLow(3); digitalWriteLow(4); digitalWriteLow(5); digitalWriteLow(6);
. In order for the test to be significant, you should make surechan
is not a compile-time constant. Alternatively, compile without the-flto
flag to force the compiler into creating a non-specialized translation of your code.Edgar Bonet– Edgar Bonet2017年09月07日 15:43:08 +00:00Commented Sep 7, 2017 at 15:43 -
Sorry, I should have specified that
chan
is a passed-in variable. Using a constantchan
would make this trivial, as you say.Jim Mack– Jim Mack2017年09月07日 16:08:55 +00:00Commented Sep 7, 2017 at 16:08 -
@EdgarBonet Fair assessment. I did make it a
const
. Will update...001– 0012017年09月07日 16:13:36 +00:00Commented Sep 7, 2017 at 16:13 -
1Your updated test is flawed again. The compiler computed
chan & 2
while translating(chan & 2)>>1
, and it stored this intermediate result in ther11:r10
register pair. Then, when translating(bool)(chan & 2)
, it used the fact thatchan & 2
was already in a the register file, which made that translation shorter.Edgar Bonet– Edgar Bonet2017年09月07日 16:58:28 +00:00Commented Sep 7, 2017 at 16:58 -
@EdgarBonet Thanks for your input. The answer turned out to be more complex than I initially assumed.001– 0012017年09月07日 18:46:30 +00:00Commented Sep 7, 2017 at 18:46
digitalWrite
calls are the slowest parts in here. It takes about 50 (or more) instruction cycles (if I remember it correctly), so few more or less cycles won't make so big difference.