I wrote a simple loop to test the processor speed on my Arduino Uno. The numbers I'm getting are much worse than the 16MHz advertised, by a factor of ~5. Trying to figure out what I'm missing.
long counter = 0;
int sum = 0;
uint32_t t = 0;
void setup() {
Serial.begin(9600);
t = micros();
}
void loop() {
sum += counter;
if (counter == 10000) {
float usecsPerIteration = (micros() - t) / (counter * 1.0);
Serial.print(usecsPerIteration);
Serial.println(" microseconds per iteration");
// Print result to avoid loop getting optimized away
Serial.println(sum);
}
++counter;
}
The output is:
3.77 microseconds per iteration
Here's how I'm estimating clock rate from this. The body of the loop involves two additions and an equality check. Even if we assume that these simple operations take 10 clock cycles, the implied clock rate is still just (1 iter / 3.77 us) * (1e6 us / sec) * (10 cycles / iter) = 2.65 MHz.
What am I missing?
-
You forgot that the printing itself also contributes to the equation. This is VERY EXPENSIVE. You should reset your time after every printing.Kwasmich– Kwasmich2019年08月21日 14:28:14 +00:00Commented Aug 21, 2019 at 14:28
-
I only print once after 10,000 iters and it is after I compute time elapsed.rampatowl– rampatowl2019年08月21日 14:29:57 +00:00Commented Aug 21, 2019 at 14:29
2 Answers 2
As Michael says there is a lot more going on than you think.
Firstly there are interrupts triggering which can slow things down (used to calculate milllis()).
Secondly you vastly under estimate the number of instructions that are being called. For a start the loop
is being called from a for(;;)
loop in main()
:
for (;;) {
loop();
4 572: 0e 94 73 00 call 0xe6 ; 0xe6 <loop>
if (serialEventRun) serialEventRun();
2 576: 20 97 sbiw r28, 0x00 ; 0
2 578: e1 f3 breq .-8 ; 0x572 <main+0x10>
- 57a: 0e 94 1d 01 call 0x23a ; 0x23a <_Z14serialEventRunv>
2 57e: f9 cf rjmp .-14 ; 0x572 <main+0x10>
10 total
Then for the call to loop()
the registers used get pushed to the stack, then your code executed, and the registers popped off again before returning. This is the code for loop when disassembled:
000000e6 <loop>:
// Save the registers to the stack
2 e6: cf 92 push r12
2 e8: df 92 push r13
2 ea: ef 92 push r14
2 ec: ff 92 push r15
8 total
// Add to the sum
2 ee: 40 91 44 01 lds r20, 0x0144 ; 0x800144 <counter>
2 f2: 50 91 45 01 lds r21, 0x0145 ; 0x800145 <counter+0x1>
2 f6: 60 91 46 01 lds r22, 0x0146 ; 0x800146 <counter+0x2>
2 fa: 70 91 47 01 lds r23, 0x0147 ; 0x800147 <counter+0x3>
2 fe: 80 91 42 01 lds r24, 0x0142 ; 0x800142 <sum>
2 102: 90 91 43 01 lds r25, 0x0143 ; 0x800143 <sum+0x1>
1 106: 84 0f add r24, r20
1 108: 95 1f adc r25, r21
2 10a: 90 93 43 01 sts 0x0143, r25 ; 0x800143 <sum+0x1>
2 10e: 80 93 42 01 sts 0x0142, r24 ; 0x800142 <sum>
18 total
// Check the value to see if it's time to print
1 112: 40 31 cpi r20, 0x10 ; 16
1 114: 57 42 sbci r21, 0x27 ; 39
1 116: 61 05 cpc r22, r1
1 118: 71 05 cpc r23, r1
2 11a: d1 f5 brne .+116 ; 0x190 <loop+0xaa>
6 total
// From here on is the "IF" section - skipped by the BRNE above
- 11c: 0e 94 f5 04 call 0x9ea ; 0x9ea <micros>
- 120: c0 90 3e 01 lds r12, 0x013E ; 0x80013e <__data_end>
- 124: d0 90 3f 01 lds r13, 0x013F ; 0x80013f <__data_end+0x1>
- 128: e0 90 40 01 lds r14, 0x0140 ; 0x800140 <__data_end+0x2>
- 12c: f0 90 41 01 lds r15, 0x0141 ; 0x800141 <__data_end+0x3>
- 130: 6c 19 sub r22, r12
- 132: 7d 09 sbc r23, r13
- 134: 8e 09 sbc r24, r14
- 136: 9f 09 sbc r25, r15
- 138: 0e 94 68 06 call 0xcd0 ; 0xcd0 <__floatunsisf>
- 13c: 6b 01 movw r12, r22
- 13e: 7c 01 movw r14, r24
- 140: 60 91 44 01 lds r22, 0x0144 ; 0x800144 <counter>
- 144: 70 91 45 01 lds r23, 0x0145 ; 0x800145 <counter+0x1>
- 148: 80 91 46 01 lds r24, 0x0146 ; 0x800146 <counter+0x2>
- 14c: 90 91 47 01 lds r25, 0x0147 ; 0x800147 <counter+0x3>
- 150: 0e 94 6a 06 call 0xcd4 ; 0xcd4 <__floatsisf>
- 154: 9b 01 movw r18, r22
- 156: ac 01 movw r20, r24
- 158: c7 01 movw r24, r14
- 15a: b6 01 movw r22, r12
- 15c: 0e 94 c7 05 call 0xb8e ; 0xb8e <__divsf3>
- 160: ab 01 movw r20, r22
- 162: bc 01 movw r22, r24
- 164: 22 e0 ldi r18, 0x02 ; 2
- 166: 30 e0 ldi r19, 0x00 ; 0
- 168: 88 e4 ldi r24, 0x48 ; 72
- 16a: 91 e0 ldi r25, 0x01 ; 1
- 16c: 0e 94 a9 04 call 0x952 ; 0x952 <_ZN5Print5printEdi>
- 170: 60 e0 ldi r22, 0x00 ; 0
- 172: 71 e0 ldi r23, 0x01 ; 1
- 174: 88 e4 ldi r24, 0x48 ; 72
- 176: 91 e0 ldi r25, 0x01 ; 1
- 178: 0e 94 0b 03 call 0x616 ; 0x616 <_ZN5Print7printlnEPKc>
- 17c: 60 91 42 01 lds r22, 0x0142 ; 0x800142 <sum>
- 180: 70 91 43 01 lds r23, 0x0143 ; 0x800143 <sum+0x1>
- 184: 4a e0 ldi r20, 0x0A ; 10
- 186: 50 e0 ldi r21, 0x00 ; 0
- 188: 88 e4 ldi r24, 0x48 ; 72
- 18a: 91 e0 ldi r25, 0x01 ; 1
- 18c: 0e 94 b0 03 call 0x760 ; 0x760 <_ZN5Print7printlnEii>
// All this is just counter++
2 190: 80 91 44 01 lds r24, 0x0144 ; 0x800144 <counter>
2 194: 90 91 45 01 lds r25, 0x0145 ; 0x800145 <counter+0x1>
2 198: a0 91 46 01 lds r26, 0x0146 ; 0x800146 <counter+0x2>
2 19c: b0 91 47 01 lds r27, 0x0147 ; 0x800147 <counter+0x3>
2 1a0: 01 96 adiw r24, 0x01 ; 1
1 1a2: a1 1d adc r26, r1
1 1a4: b1 1d adc r27, r1
2 1a6: 80 93 44 01 sts 0x0144, r24 ; 0x800144 <counter>
2 1aa: 90 93 45 01 sts 0x0145, r25 ; 0x800145 <counter+0x1>
2 1ae: a0 93 46 01 sts 0x0146, r26 ; 0x800146 <counter+0x2>
2 1b2: b0 93 47 01 sts 0x0147, r27 ; 0x800147 <counter+0x3>
20 total
// Get the registers back and return
2 1b6: ff 90 pop r15
2 1b8: ef 90 pop r14
2 1ba: df 90 pop r13
2 1bc: cf 90 pop r12
4 1be: 08 95 ret
12 total
That's more than 10 instructions. I make it 74 clocks to run one full iteration. (counter++
alone is 20 clock cycles - double your estimation for the total...)
You have to remember two important things:
- AVR is a RISC CPU - that means that seemingly simple operations can take multiple instructions to perform.
- AVR is an 8-bit CPU - that means that working with any variables bigger than 8 bits in size require far more complex code, and you work with 32-bit variables (and floats, but that's outside the timing loop section of your code).
-
I am counting 72 clock cycles per iteration in your disassembly (20 for
counter++
). It looks like you compiled without-flto
.Edgar Bonet– Edgar Bonet2019年08月22日 11:37:00 +00:00Commented Aug 22, 2019 at 11:37 -
@EdgarBonet Quite possibly, yes. I use UECIDE which may not use the same compilation flags as Arduino if they decide to change things like that randomly - also probably not the same version of the compiler (I have lots of versions to choose from).Majenko– Majenko2019年08月22日 11:39:02 +00:00Commented Aug 22, 2019 at 11:39
-
@EdgarBonet ADIW is two clocks not one I see, even though it only needs one read from the flash. I was counting flash reads as clocks. So yes, there's even more than at first it appears.Majenko– Majenko2019年08月22日 11:41:48 +00:00Commented Aug 22, 2019 at 11:41
-
Indeed, the Arduino folks changed the compilation flags, adding LTO in 1.6.10.Edgar Bonet– Edgar Bonet2019年08月22日 11:48:35 +00:00Commented Aug 22, 2019 at 11:48
-
1@EdgarBonet I should add that to the arduino core in UECIDE then really. It's time to build a new version anyway I guess. Any way - I have listed (and summed) the actual clock cycle values in the answer now.Majenko– Majenko2019年08月22日 11:54:28 +00:00Commented Aug 22, 2019 at 11:54
The loop function is called within the framework of the Arduino IDE. Beyond the code you write, also an interrupt check is done, which also cost time, and explains the difference.
The following function CAN be called after every loop call. See Majenko's answer about the details.
/*
SerialEvent occurs whenever a new data comes in the hardware serial RX. This
routine is run between each time loop() runs, so using delay inside loop can
delay response. Multiple bytes of data may be available.
*/
void serialEvent() {
while (Serial.available()) {
// get the new byte:
char inChar = (char)Serial.read();
// add it to the inputString:
inputString += inChar;
// if the incoming character is a newline, set a flag so the main loop can
// do something about it:
if (inChar == '\n') {
stringComplete = true;
}
}
}
See SerialEvent for more info.
However, as Kwasmich's comment says, you can easily circumvent this by performing your test inside a while
loop.
-
Does the interrupt loop check really take ~50 cycles?rampatowl– rampatowl2019年08月21日 14:25:32 +00:00Commented Aug 21, 2019 at 14:25
-
2Repeat your experiment by moving the loop code into a
while(1) {}
loop inside ofsetup()
. That way you can eliminate the contribution of the arduino framework.Kwasmich– Kwasmich2019年08月21日 14:25:36 +00:00Commented Aug 21, 2019 at 14:25 -
I'm trying to find the code.Michel Keijzers– Michel Keijzers2019年08月21日 14:26:15 +00:00Commented Aug 21, 2019 at 14:26
-
2
SerialEvent
is an optional function the user may implement in their sketch if they want. If they don't implement it then nothing gets called. It does, however, do a quick check to see if the function has been defined or not though, which takes a couple of clock cycles.Majenko– Majenko2019年08月21日 14:35:01 +00:00Commented Aug 21, 2019 at 14:35 -
@Majenko ... thanks, I will update my answer, and upvote yours.Michel Keijzers– Michel Keijzers2019年08月21日 14:37:36 +00:00Commented Aug 21, 2019 at 14:37