3

I wrote a simple loop to test the processor speed on my Arduino Uno. The numbers I'm getting are much worse than the 16MHz advertised, by a factor of ~5. Trying to figure out what I'm missing.

long counter = 0;
int sum = 0;
uint32_t t = 0;
void setup() {
 Serial.begin(9600);
 t = micros();
}
void loop() {
 sum += counter;
 if (counter == 10000) {
 float usecsPerIteration = (micros() - t) / (counter * 1.0);
 Serial.print(usecsPerIteration);
 Serial.println(" microseconds per iteration");
 // Print result to avoid loop getting optimized away
 Serial.println(sum);
 }
 ++counter;
}

The output is:

3.77 microseconds per iteration

Here's how I'm estimating clock rate from this. The body of the loop involves two additions and an equality check. Even if we assume that these simple operations take 10 clock cycles, the implied clock rate is still just (1 iter / 3.77 us) * (1e6 us / sec) * (10 cycles / iter) = 2.65 MHz.

What am I missing?

asked Aug 21, 2019 at 14:09
2
  • You forgot that the printing itself also contributes to the equation. This is VERY EXPENSIVE. You should reset your time after every printing. Commented Aug 21, 2019 at 14:28
  • I only print once after 10,000 iters and it is after I compute time elapsed. Commented Aug 21, 2019 at 14:29

2 Answers 2

8

As Michael says there is a lot more going on than you think.

Firstly there are interrupts triggering which can slow things down (used to calculate milllis()).

Secondly you vastly under estimate the number of instructions that are being called. For a start the loop is being called from a for(;;) loop in main():

 for (;;) {
 loop();
4 572: 0e 94 73 00 call 0xe6 ; 0xe6 <loop>
 if (serialEventRun) serialEventRun();
2 576: 20 97 sbiw r28, 0x00 ; 0
2 578: e1 f3 breq .-8 ; 0x572 <main+0x10>
- 57a: 0e 94 1d 01 call 0x23a ; 0x23a <_Z14serialEventRunv>
2 57e: f9 cf rjmp .-14 ; 0x572 <main+0x10>
10 total

Then for the call to loop() the registers used get pushed to the stack, then your code executed, and the registers popped off again before returning. This is the code for loop when disassembled:

000000e6 <loop>:
 // Save the registers to the stack
2 e6: cf 92 push r12
2 e8: df 92 push r13
2 ea: ef 92 push r14
2 ec: ff 92 push r15
8 total
 // Add to the sum
2 ee: 40 91 44 01 lds r20, 0x0144 ; 0x800144 <counter>
2 f2: 50 91 45 01 lds r21, 0x0145 ; 0x800145 <counter+0x1>
2 f6: 60 91 46 01 lds r22, 0x0146 ; 0x800146 <counter+0x2>
2 fa: 70 91 47 01 lds r23, 0x0147 ; 0x800147 <counter+0x3>
2 fe: 80 91 42 01 lds r24, 0x0142 ; 0x800142 <sum>
2 102: 90 91 43 01 lds r25, 0x0143 ; 0x800143 <sum+0x1>
1 106: 84 0f add r24, r20
1 108: 95 1f adc r25, r21
2 10a: 90 93 43 01 sts 0x0143, r25 ; 0x800143 <sum+0x1>
2 10e: 80 93 42 01 sts 0x0142, r24 ; 0x800142 <sum>
18 total
 // Check the value to see if it's time to print
1 112: 40 31 cpi r20, 0x10 ; 16
1 114: 57 42 sbci r21, 0x27 ; 39
1 116: 61 05 cpc r22, r1
1 118: 71 05 cpc r23, r1
2 11a: d1 f5 brne .+116 ; 0x190 <loop+0xaa>
6 total
 // From here on is the "IF" section - skipped by the BRNE above
- 11c: 0e 94 f5 04 call 0x9ea ; 0x9ea <micros>
- 120: c0 90 3e 01 lds r12, 0x013E ; 0x80013e <__data_end>
- 124: d0 90 3f 01 lds r13, 0x013F ; 0x80013f <__data_end+0x1>
- 128: e0 90 40 01 lds r14, 0x0140 ; 0x800140 <__data_end+0x2>
- 12c: f0 90 41 01 lds r15, 0x0141 ; 0x800141 <__data_end+0x3>
- 130: 6c 19 sub r22, r12
- 132: 7d 09 sbc r23, r13
- 134: 8e 09 sbc r24, r14
- 136: 9f 09 sbc r25, r15
- 138: 0e 94 68 06 call 0xcd0 ; 0xcd0 <__floatunsisf>
- 13c: 6b 01 movw r12, r22
- 13e: 7c 01 movw r14, r24
- 140: 60 91 44 01 lds r22, 0x0144 ; 0x800144 <counter>
- 144: 70 91 45 01 lds r23, 0x0145 ; 0x800145 <counter+0x1>
- 148: 80 91 46 01 lds r24, 0x0146 ; 0x800146 <counter+0x2>
- 14c: 90 91 47 01 lds r25, 0x0147 ; 0x800147 <counter+0x3>
- 150: 0e 94 6a 06 call 0xcd4 ; 0xcd4 <__floatsisf>
- 154: 9b 01 movw r18, r22
- 156: ac 01 movw r20, r24
- 158: c7 01 movw r24, r14
- 15a: b6 01 movw r22, r12
- 15c: 0e 94 c7 05 call 0xb8e ; 0xb8e <__divsf3>
- 160: ab 01 movw r20, r22
- 162: bc 01 movw r22, r24
- 164: 22 e0 ldi r18, 0x02 ; 2
- 166: 30 e0 ldi r19, 0x00 ; 0
- 168: 88 e4 ldi r24, 0x48 ; 72
- 16a: 91 e0 ldi r25, 0x01 ; 1
- 16c: 0e 94 a9 04 call 0x952 ; 0x952 <_ZN5Print5printEdi>
- 170: 60 e0 ldi r22, 0x00 ; 0
- 172: 71 e0 ldi r23, 0x01 ; 1
- 174: 88 e4 ldi r24, 0x48 ; 72
- 176: 91 e0 ldi r25, 0x01 ; 1
- 178: 0e 94 0b 03 call 0x616 ; 0x616 <_ZN5Print7printlnEPKc>
- 17c: 60 91 42 01 lds r22, 0x0142 ; 0x800142 <sum>
- 180: 70 91 43 01 lds r23, 0x0143 ; 0x800143 <sum+0x1>
- 184: 4a e0 ldi r20, 0x0A ; 10
- 186: 50 e0 ldi r21, 0x00 ; 0
- 188: 88 e4 ldi r24, 0x48 ; 72
- 18a: 91 e0 ldi r25, 0x01 ; 1
- 18c: 0e 94 b0 03 call 0x760 ; 0x760 <_ZN5Print7printlnEii>
 // All this is just counter++
2 190: 80 91 44 01 lds r24, 0x0144 ; 0x800144 <counter>
2 194: 90 91 45 01 lds r25, 0x0145 ; 0x800145 <counter+0x1>
2 198: a0 91 46 01 lds r26, 0x0146 ; 0x800146 <counter+0x2>
2 19c: b0 91 47 01 lds r27, 0x0147 ; 0x800147 <counter+0x3>
2 1a0: 01 96 adiw r24, 0x01 ; 1
1 1a2: a1 1d adc r26, r1
1 1a4: b1 1d adc r27, r1
2 1a6: 80 93 44 01 sts 0x0144, r24 ; 0x800144 <counter>
2 1aa: 90 93 45 01 sts 0x0145, r25 ; 0x800145 <counter+0x1>
2 1ae: a0 93 46 01 sts 0x0146, r26 ; 0x800146 <counter+0x2>
2 1b2: b0 93 47 01 sts 0x0147, r27 ; 0x800147 <counter+0x3>
20 total 
 // Get the registers back and return
2 1b6: ff 90 pop r15
2 1b8: ef 90 pop r14
2 1ba: df 90 pop r13
2 1bc: cf 90 pop r12
4 1be: 08 95 ret
12 total 

That's more than 10 instructions. I make it 74 clocks to run one full iteration. (counter++ alone is 20 clock cycles - double your estimation for the total...)

You have to remember two important things:

  • AVR is a RISC CPU - that means that seemingly simple operations can take multiple instructions to perform.
  • AVR is an 8-bit CPU - that means that working with any variables bigger than 8 bits in size require far more complex code, and you work with 32-bit variables (and floats, but that's outside the timing loop section of your code).
answered Aug 21, 2019 at 14:32
6
  • I am counting 72 clock cycles per iteration in your disassembly (20 for counter++). It looks like you compiled without -flto. Commented Aug 22, 2019 at 11:37
  • @EdgarBonet Quite possibly, yes. I use UECIDE which may not use the same compilation flags as Arduino if they decide to change things like that randomly - also probably not the same version of the compiler (I have lots of versions to choose from). Commented Aug 22, 2019 at 11:39
  • @EdgarBonet ADIW is two clocks not one I see, even though it only needs one read from the flash. I was counting flash reads as clocks. So yes, there's even more than at first it appears. Commented Aug 22, 2019 at 11:41
  • Indeed, the Arduino folks changed the compilation flags, adding LTO in 1.6.10. Commented Aug 22, 2019 at 11:48
  • 1
    @EdgarBonet I should add that to the arduino core in UECIDE then really. It's time to build a new version anyway I guess. Any way - I have listed (and summed) the actual clock cycle values in the answer now. Commented Aug 22, 2019 at 11:54
0

The loop function is called within the framework of the Arduino IDE. Beyond the code you write, also an interrupt check is done, which also cost time, and explains the difference.

The following function CAN be called after every loop call. See Majenko's answer about the details.

/*
 SerialEvent occurs whenever a new data comes in the hardware serial RX. This
 routine is run between each time loop() runs, so using delay inside loop can
 delay response. Multiple bytes of data may be available.
*/
void serialEvent() {
 while (Serial.available()) {
 // get the new byte:
 char inChar = (char)Serial.read();
 // add it to the inputString:
 inputString += inChar;
 // if the incoming character is a newline, set a flag so the main loop can
 // do something about it:
 if (inChar == '\n') {
 stringComplete = true;
 }
 }
}

See SerialEvent for more info.

However, as Kwasmich's comment says, you can easily circumvent this by performing your test inside a while loop.

answered Aug 21, 2019 at 14:23
5
  • Does the interrupt loop check really take ~50 cycles? Commented Aug 21, 2019 at 14:25
  • 2
    Repeat your experiment by moving the loop code into a while(1) {} loop inside of setup(). That way you can eliminate the contribution of the arduino framework. Commented Aug 21, 2019 at 14:25
  • I'm trying to find the code. Commented Aug 21, 2019 at 14:26
  • 2
    SerialEvent is an optional function the user may implement in their sketch if they want. If they don't implement it then nothing gets called. It does, however, do a quick check to see if the function has been defined or not though, which takes a couple of clock cycles. Commented Aug 21, 2019 at 14:35
  • @Majenko ... thanks, I will update my answer, and upvote yours. Commented Aug 21, 2019 at 14:37

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.