Considering the design limitation of hardware (clock speed, memory, etc...), is there any better way to write IF statements more efficiently in C language (e.g. run on Arduino), more specifically the following piece of code of an embedded software?
if (A >= 476 && A <= 524) A = 500;
else if (A > 524 && A <= 860) A = (A * 1.4) - 760;
else if (A > 136 && A < 476) A = (A * 1.4) - 685;
else if (A > 860) A = 1000;
else if (A <= 136) A = 100;
A is passed as value by a sub-routine and this value is then transformed into a different value according to its position in a set range (of values). It has to run as fast as possible!
UPDATE!
A is a variable of integer type i.e. int
SUMMARY:
I summarized here the options I learned from you:
a) improve structure
- consider first comparison for most probable inputs;
- eliminate && operators.
b) eliminate floating point calculation
c) create look up tables
d) assign more adequate variable type
e) write code in abbreviated form (*= , += , -= )
a) Indeed, I already put the comparison 'if (A>= 476 && A <= 524)' first, because I expect most of the values passed by the subroutine to be within the first range. So most of the time the program doesn't execute the next if statements.
b) I reviewed the floating point calculation and consider it a bad choice for this piece of code. It definitely can be avoided by using a divider of integers.
c) The lookup table seems a good idea but not for this application, because the program has at least 3 more of such clustered comparisons. Memory might become a problem.
d) The variable was assigned as int although it would never really become negative. Unsigned int is what I assume is better. But I would say it doesn't make the code faster.
e) Writing the code in abbreviated form seems also another improvement idea. I just don't know for sure how different compilers might interpret that or if it really saves so much time.
Thank you.
-
3\$\begingroup\$ you can very well re-structure your code to look more logical. if you've checked for A not being smaller than 476, you can drop half of your conditions instantly. \$\endgroup\$Marcus Müller– Marcus Müller2016年08月23日 20:56:06 +00:00Commented Aug 23, 2016 at 20:56
-
5\$\begingroup\$ if you are worried about speed on an arduino dont use floating point (esp double precision). all of the rest of the optimizations if any pale by comparison. \$\endgroup\$old_timer– old_timer2016年08月23日 21:17:55 +00:00Commented Aug 23, 2016 at 21:17
-
\$\begingroup\$ you can help perhaps by re-ordering but the compilers are pretty good as well at figuring things out. look at the compiler output and adjust and look and adjust, every so often time it because fewer instructions doesnt necessarily mean faster. \$\endgroup\$old_timer– old_timer2016年08月23日 21:19:53 +00:00Commented Aug 23, 2016 at 21:19
-
\$\begingroup\$ if you were truly after speed a lookup table would apply here, your numbers are all even and the largest is 860 so a 430 deep look up table, could then do a jump table with the output of that. wait you have a greater than 860 so I dont know how high A can go as to how deep the table needs to be. \$\endgroup\$old_timer– old_timer2016年08月23日 21:22:53 +00:00Commented Aug 23, 2016 at 21:22
-
2\$\begingroup\$ "It has to run as fast as possible!" - why? \$\endgroup\$Bruce Abbott– Bruce Abbott2016年08月23日 22:00:34 +00:00Commented Aug 23, 2016 at 22:00
4 Answers 4
The floating point is the killer here. It appears to be doubling the amount of code generated, not that number of instructions is necessarily slower.
It has to make a number of calls to what I assume is a float to int int to float. Interestingly changing from double float (1.4) to single float (1.4F) didnt change the overall result, although again the code is making a lot of calls (I assume to float conversions or operations).
So even the times 7 has to be synthesized.
unsigned int fun ( unsigned int A )
{
A = A*7;
return(A);
}
00000000 <fun>:
0: 28 2f mov r18, r24
2: 39 2f mov r19, r25
4: 22 0f add r18, r18
6: 33 1f adc r19, r19
8: 22 0f add r18, r18
a: 33 1f adc r19, r19
c: 22 0f add r18, r18
e: 33 1f adc r19, r19
10: 42 2f mov r20, r18
12: 53 2f mov r21, r19
14: 48 1b sub r20, r24
16: 59 0b sbc r21, r25
18: 84 2f mov r24, r20
1a: 95 2f mov r25, r21
1c: 08 95 ret
so one might try this as an alternate.
unsigned int fun ( unsigned int A )
{
A = (A<<2)+(A<<1)+A;
return(A);
}
00000000 <fun>:
0: 28 2f mov r18, r24
2: 39 2f mov r19, r25
4: 22 0f add r18, r18
6: 33 1f adc r19, r19
8: 22 0f add r18, r18
a: 33 1f adc r19, r19
c: 48 2f mov r20, r24
e: 59 2f mov r21, r25
10: 44 0f add r20, r20
12: 55 1f adc r21, r21
14: 24 0f add r18, r20
16: 35 1f adc r19, r21
18: 82 0f add r24, r18
1a: 93 1f adc r25, r19
1c: 08 95 ret
same number of instructions. to see this easier
unsigned char fun ( unsigned char A )
{
A = A * 7;
return(A);
}
00000000 <fun>:
0: 98 2f mov r25, r24
2: 99 0f add r25, r25
4: 99 0f add r25, r25
6: 99 0f add r25, r25
8: 98 1b sub r25, r24
a: 89 2f mov r24, r25
c: 08 95 ret
and using shifting and adding
00000000 <fun>:
0: 98 2f mov r25, r24
2: 99 0f add r25, r25
4: 99 0f add r25, r25
6: 28 2f mov r18, r24
8: 22 0f add r18, r18
a: 92 0f add r25, r18
c: 89 0f add r24, r25
e: 08 95 ret
Duh, right A*7 = (A<<3)-1;
00000000 <fun>:
0: 98 2f mov r25, r24 r25 = A
2: 99 0f add r25, r25 r25 = A + A = 2A
4: 99 0f add r25, r25 r25 = 2A + 2A = 4A
6: 99 0f add r25, r25 r25 = 4A + 4A = 8A
8: 98 1b sub r25, r24 r25 = 8A - A = 7A
a: 89 2f mov r24, r25 return 7A
c: 08 95 ret
The (A<<2)|(A<<1)+A source gives this
00000000 <fun>:
0: 98 2f mov r25, r24 r25 = A
2: 99 0f add r25, r25 r25 = A + A = 2A
4: 99 0f add r25, r25 r25 = 2A + 2A = 4A
6: 28 2f mov r18, r24 r18 = A
8: 22 0f add r18, r18 r18 = A + A = 2A
a: 92 0f add r25, r18 r15 = 4A + 2A = 6A
c: 89 0f add r24, r25 return (6A + A)
e: 08 95 ret
But they missed a possible optimization
mov r25, r24 r25 = A
add r25, r25 r25 = A + A = 2A
mov r18, r25 r18 = 2A
add r25, r25 r25 = 2A + 2A = 4A
add r25, r18 r15 = 4A + 2A = 6A
add r24, r25 return (6A + A)
ret
same number of instructions at least, ideally execute the same. Looking at these trip points
if ( A <= 136 ) A= 100;
else if ( A < 476 ) A= (A * 1.4) - 685;
else if ( A <= 524 ) A= 500;
else if ( A <= 860 ) A= (A * 1.4) - 760;
else A= 1000;
A is obviously not an 8 bit number is at least 16 bit, if it is a double or a float I give up, game over. Assuming 16 bits...
Here are the constants in hex.
136 0x88
476 0x1dc
524 0x20C
860 0x35C
Looking at this
unsigned int fun ( unsigned int A )
{
if ( A <= 136 ) A= 1;
else if ( A < 476 ) A=2;
else if ( A <= 524 ) A=3;
else A= 4;
return(A);
}
00000000 <fun>:
0: 89 38 cpi r24, 0x89 ; 137
2: 91 05 cpc r25, r1
4: 00 f0 brcs .+0 ; 0x6 <fun+0x6>
6: 8c 3d cpi r24, 0xDC ; 220
8: 21 e0 ldi r18, 0x01 ; 1
a: 92 07 cpc r25, r18
c: 00 f4 brcc .+0 ; 0xe <fun+0xe>
e: 82 e0 ldi r24, 0x02 ; 2
10: 90 e0 ldi r25, 0x00 ; 0
12: 08 95 ret
14: 8d 30 cpi r24, 0x0D ; 13
16: 92 40 sbci r25, 0x02 ; 2
18: 00 f4 brcc .+0 ; 0x1a <fun+0x1a>
1a: 83 e0 ldi r24, 0x03 ; 3
1c: 90 e0 ldi r25, 0x00 ; 0
1e: 08 95 ret
20: 81 e0 ldi r24, 0x01 ; 1
22: 90 e0 ldi r25, 0x00 ; 0
24: 08 95 ret
26: 84 e0 ldi r24, 0x04 ; 4
28: 90 e0 ldi r25, 0x00 ; 0
2a: 08 95 ret
does this help
unsigned int fun ( unsigned int A )
{
unsigned char B = A>>2;
if ( B <= (136>>2) ) A= 1;
else if ( B < (476>>2) ) A=2;
else if ( B <= (524>>2) ) A=3;
else A= 4;
return(A);
}
00000000 <fun>:
0: 96 95 lsr r25
2: 87 95 ror r24
4: 96 95 lsr r25
6: 87 95 ror r24
8: 83 32 cpi r24, 0x23 ; 35
a: 00 f0 brcs .+0 ; 0xc <fun+0xc>
c: 87 37 cpi r24, 0x77 ; 119
e: 00 f4 brcc .+0 ; 0x10 <fun+0x10>
10: 82 e0 ldi r24, 0x02 ; 2
12: 90 e0 ldi r25, 0x00 ; 0
14: 08 95 ret
16: 84 38 cpi r24, 0x84 ; 132
18: 00 f4 brcc .+0 ; 0x1a <fun+0x1a>
1a: 83 e0 ldi r24, 0x03 ; 3
1c: 90 e0 ldi r25, 0x00 ; 0
1e: 08 95 ret
20: 81 e0 ldi r24, 0x01 ; 1
22: 90 e0 ldi r25, 0x00 ; 0
24: 08 95 ret
26: 84 e0 ldi r24, 0x04 ; 4
28: 90 e0 ldi r25, 0x00 ; 0
2a: 08 95 ret
ehh, man this instruction set is painful.
so looking at this
unsigned int fun ( unsigned int A )
{
if ( A <= 136 ) A= 100;
else if ( A < 476 ) A= (A * 7);
else if ( A <= 524 ) A= 500;
else if ( A <= 860 ) A= (A * 7);
else A= 1000;
return(A);
}
the compiler is smart enough to lump the A*7's together.
Like this though
unsigned int fun ( unsigned int A )
{
if ( A <= 136 ) A= 100;
else if ( A < 476 ) A= (A * 7) - 685;
else if ( A <= 524 ) A= 500;
else if ( A <= 860 ) A= (A * 7) - 760;
else A= 1000;
return(A);
}
It does not combine the A*7s note the 16 bit A*7 looks like this
30: 28 2f mov r18, r24
32: 39 2f mov r19, r25
34: 22 0f add r18, r18
36: 33 1f adc r19, r19
38: 22 0f add r18, r18
3a: 33 1f adc r19, r19
3c: 22 0f add r18, r18
3e: 33 1f adc r19, r19
40: 42 2f mov r20, r18
42: 53 2f mov r21, r19
44: 48 1b sub r20, r24
46: 59 0b sbc r21, r25
48: 84 2f mov r24, r20
4a: 95 2f mov r25, r21
4c: 08 95 ret
for ease of reading and coding the comparisons in order
if ( A > 860 ) A= 1000;
else if ( A > 524 ) A= (A * 1.4) - 760;
else if ( A >= 476 ) A= 500;
else if ( A > 136 ) A= (A * 1.4) - 685;
else A= 100;
if ( A <= 136 ) A= 100;
else if ( A < 476 ) A= (A * 1.4) - 685;
else if ( A <= 524 ) A= 500;
else if ( A <= 860 ) A= (A * 1.4) - 760;
else A= 1000;
Might shave some instructions for the worst case path, but if your data happens to fall in a bell curve of some sort, most of the samples are say between 500 and 600, then you want to have those ranges compared first then let the others fall through. it may or may not be worth trying to get the ranges that end with A*1.4 as the last two, so that before those compares you do a single B = A * 1.4. you do the compare of one of the ranges then either subtract A = B - 685 or A = B - 760, saving some codes space, not necessarily saving on performance because with an if-then-else tree like this re-ordering, etc is not going to save you much, some paths are shorter some are longer, you may add a few or save a few instructions here and there, but it is still a pretty long path for this code on this architecture.
I didnt do a full link with a library, but looking at readelf:
11: 00000000 0 NOTYPE GLOBAL DEFAULT UND __floatunsisf
12: 00000000 0 NOTYPE GLOBAL DEFAULT UND __mulsf3
13: 00000000 0 NOTYPE GLOBAL DEFAULT UND __subsf3
14: 00000000 0 NOTYPE GLOBAL DEFAULT UND __fixunssfsi
Those floating point libraries are going to kill you and that answers something, even though the C standard says that A * 1.4 is a double operation the compiler is clearly doing single float, ignoring the standard.
Changing the A * 1.4 to A * 7 / 5 implements the long string of adds to implement the times 7 and does the divide with a gcc library
12: 00000000 0 NOTYPE GLOBAL DEFAULT UND __udivmodhi4
That is going to save you on overall code space and should save on execution when you would have normally had to deal with generic floating point libraries.
Man, I didnt realize you were using signed numbers.
else if ( A > 524 ) A= (A * 7 / 5) - 760;
else if ( A > 136 ) A= (A * 7 / 5) - 685;
both result in negative numbers, I assume coming in it is positive and going out signed?
Okay as I expected this actually gets us somewhere
unsigned int fun ( unsigned int A )
{
unsigned char B;
B = (A>>8)&0xFF;
if(B==0) A = 7;
if(B==1) A = 18;
if(B==2) A = 307;
if(B==3) A = 211;
return(A);
}
0: 99 23 and r25, r25
2: 01 f0 breq .+0 ; 0x4 <fun+0x4>
4: 91 30 cpi r25, 0x01 ; 1
6: 01 f4 brne .+0 ; 0x8 <fun+0x8>
8: 82 e1 ldi r24, 0x12 ; 18
a: 90 e0 ldi r25, 0x00 ; 0
c: 08 95 ret
we know that it is using a register for the upper half and a register for the lower half so the shift of 8 with or without the and for completeness simply tells the compiler just use the upper 8 bit register.
Now I am assuming an unsigned incoming and signed outgoing since you dont really have anything in the negative range coming in except for the everything less than or equal to 136.
if you make a table of all the answers. (it comes out to 679 unique answers, if you could only limit it to that).
Ugh, nothing is nice about these numbers.
If your input is unsigned which I suspect it is not, you could look at the upper byte using the B=A>>8 thing and if it is greater than 4 then return 1000. And that only really saves you one 16 bit compare becoming an 8 bit compare with this instruction set and the nature of your numbers.
The next optimization is get rid of the floating point and replace it with (A*7)/5. That is the number one thing you can do with this code to make it faster.
If you have a range of values you expect more than others, put those ranges first and it has to hop through less code for the more common values. If it is an even distribution then either of jonks if then else trees makes it easier to read and might save an instruction here and there. The compiler may still move the tree around to suit its needs or an optimization it may think it sees.
The killers are the two ranges that have the math, if you are willing to burn 960 bytes of program space you can pre-build look up tables for these
extern unsigned int LUTONE[];
extern unsigned int LUTTWO[]
unsigned int fun ( unsigned int A )
{
if ( A <= 136 ) A= 100;
else if ( A < 476 ) A= LUTONE[A-477];
else if ( A <= 524 ) A= 500;
else if ( A <= 860 ) A= LUTTWO[A-860];
else A= 1000;
return(A);
}
The lookups look something like this
30: 88 0f add r24, r24
32: 99 1f adc r25, r25
34: e8 2f mov r30, r24
36: f9 2f mov r31, r25
38: e0 50 subi r30, 0x00 ; 0
3a: f0 40 sbci r31, 0x00 ; 0
3c: 80 81 ld r24, Z
3e: 91 81 ldd r25, Z+1 ; 0x01
40: 08 95 ret
42: 88 0f add r24, r24
44: 99 1f adc r25, r25
46: e8 2f mov r30, r24
48: f9 2f mov r31, r25
4a: e0 50 subi r30, 0x00 ; 0
4c: f0 40 sbci r31, 0x00 ; 0
4e: 80 81 ld r24, Z
50: 91 81 ldd r25, Z+1 ; 0x01
which is not much more math than the A*7 the loads on a platform like this are dependent on the speed of the flash relative to the clock it is possibly a single cycle per if not then probably two. But the instruction fetches plus the code to do the divide by 5 with a generic division library for 16 bit numbers should be way way way more costly speed wise. Resource consumption-wise the float libraries are probably smaller than the 900 some bytes, maybe, you can try it and find out. If you are doing float elsewhere then you are burning that flash anyway but that doesnt excuse using float on an AVR esp in code you want to go fast, so still do the integer math if you dont use a look up table.
Short answer get rid of the A * 1.4 and replace with (A * 7) / 5. If you know the data is going to be heavy in certain ranges more than others put those ranges first for performance, otherwise make it more readable with one of jonk's if-then-else trees: The smallest range to largest or largest to smallest. You might save an instruction or two on the deeper cases.
-
\$\begingroup\$ @tozheneznayu -
(A * 7)/5
is more expensive than a table look up from flash. It is common in Arduino projects to have quite a lot of flash available, so the tables may fit. \$\endgroup\$gbulmer– gbulmer2016年08月24日 19:10:32 +00:00Commented Aug 24, 2016 at 19:10 -
\$\begingroup\$ @tozheneznayu you can/should do some experiments, build with float and build with integer math, does the binary (not the elf file or whatever but the actual flash binary image) change sizes much? Create a test using a timer that calls your function with a value in each of the ranges many (thousands) times to time the duration, divide by the number of times you called it and get a rough idea of how fast/slow, rearrange the if/then/else tree or change between fixed point and floating point math and see how that changes. \$\endgroup\$old_timer– old_timer2016年08月24日 20:58:47 +00:00Commented Aug 24, 2016 at 20:58
-
\$\begingroup\$ Each time one looks at assembler like this, one should think "Why on earth am I not using a 32 bit MCU for?" \$\endgroup\$Lundin– Lundin2016年08月30日 14:51:12 +00:00Commented Aug 30, 2016 at 14:51
-
\$\begingroup\$ Instead of
(A * 7)/5
one may want to try(A * 14)/10
. Last time I checked, gcc did an amazing job on an integer divide-by-10; probably because that's a much-used operation. \$\endgroup\$JimmyB– JimmyB2016年10月05日 14:51:32 +00:00Commented Oct 5, 2016 at 14:51 -
\$\begingroup\$ its the same with the divide by a power of two taken out. if they can divide by 10 well they can divide by 5 even better, more accurate if you are using the multiply by 1/5th thing, shift then multiply by 1/5th instead of multiply by 10th. if doing long division then the maybe 10 is faster sure, have to think about it. \$\endgroup\$old_timer– old_timer2016年10月06日 15:39:45 +00:00Commented Oct 6, 2016 at 15:39
Ordering of the conditions is the key here. This way the conditions "accumulate" from top to bottom and each subsequent else if
is executed only if all of the previous conditions are false. So you get the (not) AND
operation for free.
if (A <= 136)
A = 100;
else if (A < 476)
A = (A * 1.4) - 685;
else if (A <= 524)
A = 500;
else if (A <= 860)
A = (A * 1.4) - 760;
else
A = 1000;
** But as a side note: I would not worry here about code optimization, as it is insignificant in this case, but about making it readable..
-
\$\begingroup\$ DonaldKnuth: premature optimization is the root of all evil \$\endgroup\$user16222– user162222016年08月23日 21:42:34 +00:00Commented Aug 23, 2016 at 21:42
-
\$\begingroup\$ this ain't premature optimization. \$\endgroup\$robert bristow-johnson– robert bristow-johnson2016年08月23日 21:56:52 +00:00Commented Aug 23, 2016 at 21:56
-
\$\begingroup\$ @robertbristow-johnson Any optimization that's not proven to be necessary is premature. Also, the cost of the floating point math dwarfs cost of the flow control, so it's pretty much pointless (not quite the same as premature, but still). \$\endgroup\$marcelm– marcelm2016年08月24日 15:08:40 +00:00Commented Aug 24, 2016 at 15:08
-
\$\begingroup\$ @marcelm, this code refactoring is not premature. it should be done immediately. called it "optimization" if you want, but clean code is always better than shitty code. \$\endgroup\$robert bristow-johnson– robert bristow-johnson2016年08月24日 16:30:24 +00:00Commented Aug 24, 2016 at 16:30
-
\$\begingroup\$ If the motivation is to have clean, readable code, yes. But if the motivation is performance (as the OP said it was), I still maintain it qualifies as useless and/or premature. \$\endgroup\$marcelm– marcelm2016年08月24日 17:16:06 +00:00Commented Aug 24, 2016 at 17:16
Marcus pretty much tells you what to consider. You can consider two symmetrical but otherwise identical ways to improve the code.
(I don't think C compilers currently do the optimization needed on their own, though I've seen research papers on compiler theory that would take care of your situation... I just don't know of any compilers that implement the research.)
These are:
if ( A <= 136 ) A= 100;
else if ( A < 476 ) A= (A * 1.4) - 685;
else if ( A <= 524 ) A= 500;
else if ( A <= 860 ) A= (A * 1.4) - 760;
else A= 1000;
and,
if ( A > 860 ) A= 1000;
else if ( A > 524 ) A= (A * 1.4) - 760;
else if ( A >= 476 ) A= 500;
else if ( A > 136 ) A= (A * 1.4) - 685;
else A= 100;
Fewer explicit comparisons by ordering things. You get to choose what makes more sense to read.
You may, or may not, find that a compiler will do something different with:
A *= 1.4, A -= 760;
than it does with,
A= (A * 1.4) - 760;
You could experiment. I've seen dumb things done either way and smart things done either way, too. So what the heck.
Because of the multiplication by 1.4, the compiler is going to promote things to floating point and back again if your variable A is an integer. You would be MUCH BETTER off working this out using integer methods. In fact, if A is declared as an integer type of some kind, then it is the floating point conversions and operations that are killing you. Not the logic. You seriously need to reconsider that multiplication, if so.
So... tell me. Is A in integer? If so, I'll edit my answer and give you a better approach, one that is much much faster.
EDIT: Just as a sampler (and that doesn't mean all that is doable), if A is an integer your two floating point calculations can be replaced by:
- \$A = \left(1.4 \cdot A \right) - 760 \rightarrow \frac{7\cdot A - 3800}{5}\$
- \$A = \left(1.4 \cdot A \right) - 685 \rightarrow \frac{7\cdot A - 3425}{5}\$
Both of these examples make the assumption that you can actually compute the numerator within the integer range, of course. But as your calculations are already ranged well, I am pretty sure that 16-bit integer values will succeed quite well with the above transformations. (Rounding questions do come up at this point, but that is a very minor detail to resolve and it doesn't materially change the above.)
But even that isn't the final word. More improvements can be found, as well.
-
\$\begingroup\$ i think you got the 685 and the 760 swapped in both of your ordered code above. \$\endgroup\$robert bristow-johnson– robert bristow-johnson2016年08月23日 21:59:39 +00:00Commented Aug 23, 2016 at 21:59
-
\$\begingroup\$ @robertbristow-johnson: Thanks. I probably did. Will fix. \$\endgroup\$jonk– jonk2016年08月23日 22:04:54 +00:00Commented Aug 23, 2016 at 22:04
On a typical low-end MCU like the AVR, even upto low-end Cortex-M0, flash and SRAM are similar speed, and can be accessed in one or two cycles; there is no cache or SDRAM latency to complicate the system.
In these MCUs, if you really want that to go as fast as practical, and have expensive operations that can be pre-calculated at compile time, then calculate everything that can be calculated at compile time, store the values in an array, and use A as the index of the array.
A float multiply is very expensive, so doing that at compile time will have an enormous impact on run-time. Even in this specific case, where * 1.4
can be replaced by * 7 / 5
, it is still expensive compared to array indexing. Of course, in the general case, it may be difficult to replace a float calculation with a cheaper integer calculation, so using look-up tables is always worth considering.
In this case, the code will use only two if
statements to protect the array indexing, for example:
#include <avr/pgmspace.h>
#define calc_137_475(a) ((a * 1.4) - 685) // more expensive than array indexing
#define calc_525_860(a) ((a * 1.4) - 760) // more expensive than array indexing
const PROGMEM int16_t results[] = {
100, 100, 100, // ... 137 of these
calc_137_475(137), calc_137_475(138), calc_137_475(139), // ... etc
500, 500, 500, // .. etc
calc_525_860(525), calc_524_860(526), calc_524_860(527), // ... the other values ...
};
void setup() {
// put your setup code here, to run once:
}
void loop() {
// other stuff
int A = analogRead(A1); // for example
if (A>860) {
A = 1000;
} else if (A >= 0) {
A = pgm_read_word_near(results + A);
}
}
This will consume about 1722 bytes of flash, and none of the MCUs precious RAM. It will be quicker for all cases that involve any of those calculations. Actual gains depend on the complexity of the calculations and on the distribution of initial values of A.
Further, if A could be guaranteed to be positive, by making A unsigned
, one if
could be replaced with a if (A <= 136) A = 100
, which would increase the ratio of calculated to constant values. The value assigned to A is constant, and hence gets no benefit from pre-calculation.
Edit:
void loop() {
// other stuff
unsigned int A = analogRead(A1); // note *unsigned*
if (A>860) {
A = 1000;
} else if (A <= 136) {
A = 100;
} else {
A = pgm_read_word_near(results + A);
}
}
So this has a better ratio of cached-calculation to constant values than the 'all cached' version. However, it still uses the same amount of flash to avoid using a more complex calculation, and hence more instructions, for the array offset.
The expense of using an array offset calculation for the constant in
if (A >= 476 && A <= 524) A = 500;
is offset by removing two tests, and replacing with the address calculation, so is comparable.
On an AVR, a byte load from program memory is one cycle slower than loading from SRAM. So if the table were small enough, and there were enough spare SRAM, using SRAM instead of flash would give a tiny improvement on table access. However, an SRAM table has to be initialised by copying from program flash, so their is a cost to be balanced.
Side Note: Unless this code is almost everything in the 'inner loop', it would be better to measure the whole 'inner loop' rather than focus on a single step.
Typically programmers are very poor at identifying bottlenecks without detailed measurement. Worse, it is easy to waste time focusing on optimisation. Specifically, this part might not be the bottleneck, or their may be no bottleneck. Worse, optimised code can become quite fragile, and small changes in the required behaviour can make optimisations be wrong.
So it is always worth getting all of the program working correctly before attempting any optimisations, base optimisation effort on meaningful measurements of the actual program, and bearing in mind, Michael A Jackson's advice on program optimisation
The First Rule of Program Optimisation: Don't do it.
The Second Rule of Program Optimisation (for experts only!): Don't do it yet.
Post-note: This specific question may be a great example of programmers looking at the wrong performance issue; the if
tests are much less costly than the floating point arithmetic. So the basis of the question was focused on the wrong place. I'd take that as an indication that the actual performance issues are likely elsewhere.