Return to Answer

+ methid summary, fixed pin toggling overflow.

Source Link

edited Jan 28, 2018 at 22:04

Edgar Bonet

edited Jan 28, 2018 at 22:04

Edgar Bonet

45.1k
4
42
81

Method summary

^{(section added on 2018年01月28日)}

There are several methods available for timing code. I am adding this preliminary section to my answer in order to provide comparative data on several methods. The methods covered in the table below are those proposed as answers to both this question and a duplicate question of this . As this question has been tagged "arduino-uno", all this data assumes an AVR-based board clocked at 16 MHz.

Method comparison:

| method | resolution | max. time | typ. overhead |
|----------------|---------------|-----------------|----------------|
| `millis()` | 1 – 2 ms | 49.7 days | 0.69 – 1.3 μs |
| `micros()` | 4 μs | 71.6 min | 2.8 – 2.9 μs |
| Timer 1 | 0.0625 μs | 4.096 ms | 0.25 – 0.5 μs |
| pin toggling | scope-limited | ∞ | 0.125 μs |
| cycle counting | 0.0625 μs | boredom-limited | 0 |
| looping | N.A. | N.A. | 0.25 μs |

The methods are characterized by the following criteria:

Resolution is the granularity of the measurement, the smaller the better.
Maximum measurable time: any method that does timing arithmetics on the Arduino is prone to overflows if measuring too long times. Note that a timer rolling over to zero during the measurement is not a problem , as long as the period being measured is less than the rollover period.
Typical measurement overhead: the code used to measure the timings takes itself a finite time to execute, thus one ends up measuring the execution time of the "instrumented" code, which is slightly larger than the time taken by the code one is trying to profile. This overhead should in principle be subtracted from the result, but it is often not known exactly, as it depends on how the compiler optimizes both the instrumented and the non-instrumented code.

The methods listed in the table are:

millis(), which may be the most obvious choice, as it is so well known. Its low resolution, however, makes it ill-suited for timing code execution. It should be noted that the millis() counter is incremented every 1024 μs. Most of the time it is incremented by 1 but, every 43 ms (roughly) it is incremented by 2 in order to avoid creeping drift. This is why its resolution is stated as "1 – 2 ms" in the table.
micros(), as proposed in ratchet freak's answer , is usually a good choice, the main caveat being the 4 μs resolution, when one could naively expect 1 μs. It also has a significant overhead.
Timer 1, which is discussed in the second part of this answer, is my favorite: it has single cycle resolution and low overhead. However, it is incompatible with other uses of the timer (PWM, Servo library...). It is also limited to measuring small delays.
Pin toggling, as proposed by 4ilo , and by myself in the third part of this answer, is ideal if you have an oscilloscope handy. Any half-decent scope should provide single-cycle resolution. It is also minimally invasive on the code being measured and has minimal overhead.
Cycle counting, as proposed in Majenko's answer , is arguably the "perfect" method: it is cycle-accurate, does not modify the code and has zero overhead. However, for anything beyond a handful instructions, it quickly becomes tedious. And it requires some understanding of the AVR assembly.
Looping, as proposed in Michel Keijzers' answer , is not a measurement technique per se. It is meant to be used in conjunction with another technique in order to improve the resolution and dilute the overhead. However, lopping carries it's own overhead, which is typically 4 CPU cycles per iteration, assuming a 16-bit loop counter.

Using Timer 1

^{(original answer of 2017年10月23日)}

One technique I often use is to make Timer 1 count at the full CPU speed and use it to time the code I want to profile. For example:

Toggling a pin

Edit:^{(section added on 2017年10月23日)}

If you have a scope, there is another method that is minimally invasive on your code: toggle a pin just before and just after the thing you want to time. E.g., assuming you have previously pinMode(13, OUTPUT):

This will create a pulse that you can measure on the scope. Note that using direct port access, like here, the overhead is only onetwo CPU cyclecycles, or 62.5125 ns. Also, direct port access won't use any CPU register, so chances are the compiler will not generate less efficient code than when not including the timing part.

One technique I often use is to make Timer 1 count at the full CPU speed and use it to time the code I want to profile. For example:

Edit: If you have a scope, there is another method that is minimally invasive on your code: toggle a pin just before and just after the thing you want to time. E.g., assuming you have previously pinMode(13, OUTPUT):

This will create a pulse that you can measure on the scope. Note that using direct port access, like here, the overhead is only one CPU cycle, or 62.5 ns. Also, direct port access won't use any CPU register, so chances are the compiler will not generate less efficient code than when not including the timing part.

Method summary

^{(section added on 2018年01月28日)}

Method comparison:

| method | resolution | max. time | typ. overhead |
|----------------|---------------|-----------------|----------------|
| `millis()` | 1 – 2 ms | 49.7 days | 0.69 – 1.3 μs |
| `micros()` | 4 μs | 71.6 min | 2.8 – 2.9 μs |
| Timer 1 | 0.0625 μs | 4.096 ms | 0.25 – 0.5 μs |
| pin toggling | scope-limited | ∞ | 0.125 μs |
| cycle counting | 0.0625 μs | boredom-limited | 0 |
| looping | N.A. | N.A. | 0.25 μs |

The methods are characterized by the following criteria:

Resolution is the granularity of the measurement, the smaller the better.
Maximum measurable time: any method that does timing arithmetics on the Arduino is prone to overflows if measuring too long times. Note that a timer rolling over to zero during the measurement is not a problem , as long as the period being measured is less than the rollover period.
Typical measurement overhead: the code used to measure the timings takes itself a finite time to execute, thus one ends up measuring the execution time of the "instrumented" code, which is slightly larger than the time taken by the code one is trying to profile. This overhead should in principle be subtracted from the result, but it is often not known exactly, as it depends on how the compiler optimizes both the instrumented and the non-instrumented code.

The methods listed in the table are:

millis(), which may be the most obvious choice, as it is so well known. Its low resolution, however, makes it ill-suited for timing code execution. It should be noted that the millis() counter is incremented every 1024 μs. Most of the time it is incremented by 1 but, every 43 ms (roughly) it is incremented by 2 in order to avoid creeping drift. This is why its resolution is stated as "1 – 2 ms" in the table.
micros(), as proposed in ratchet freak's answer , is usually a good choice, the main caveat being the 4 μs resolution, when one could naively expect 1 μs. It also has a significant overhead.
Timer 1, which is discussed in the second part of this answer, is my favorite: it has single cycle resolution and low overhead. However, it is incompatible with other uses of the timer (PWM, Servo library...). It is also limited to measuring small delays.
Pin toggling, as proposed by 4ilo , and by myself in the third part of this answer, is ideal if you have an oscilloscope handy. Any half-decent scope should provide single-cycle resolution. It is also minimally invasive on the code being measured and has minimal overhead.
Cycle counting, as proposed in Majenko's answer , is arguably the "perfect" method: it is cycle-accurate, does not modify the code and has zero overhead. However, for anything beyond a handful instructions, it quickly becomes tedious. And it requires some understanding of the AVR assembly.
Looping, as proposed in Michel Keijzers' answer , is not a measurement technique per se. It is meant to be used in conjunction with another technique in order to improve the resolution and dilute the overhead. However, lopping carries it's own overhead, which is typically 4 CPU cycles per iteration, assuming a 16-bit loop counter.

Using Timer 1

^{(original answer of 2017年10月23日)}

One technique I often use is to make Timer 1 count at the full CPU speed and use it to time the code I want to profile. For example:

Toggling a pin

^{(section added on 2017年10月23日)}

This will create a pulse that you can measure on the scope. Note that using direct port access, like here, the overhead is only two CPU cycles, or 125 ns. Also, direct port access won't use any CPU register, so chances are the compiler will not generate less efficient code than when not including the timing part.

+ pin toggle and scope method.

Source Link

edited Oct 23, 2017 at 19:21

Edgar Bonet

edited Oct 23, 2017 at 19:21

Edgar Bonet

45.1k
4
42
81

One technique I often use is to make Timer 1 count at the full CPU speed and use it to time the code I want to profile. For example:

volatile uint8_t pin = 2;
volatile uint8_t value = HIGH;
void setup()
{
 Serial.begin(9600);
 // Set Timer 1 to normal mode at F_CPU.
 TCCR1A = 0;
 TCCR1B = 1;
 // Time digitalWrite().
 cli();
 uint16_t start = TCNT1;
 digitalWrite(pin, value);
 uint16_t finish = TCNT1;
 sei();
 uint16_t overhead = 8;
 uint16_t cycles = finish - start - overhead;
 Serial.print("digitalWrite() took ");
 Serial.print(cycles);
 Serial.println(" CPU cycles.");
}
void loop(){}

Note the volatile variables used to prevent the compiler from optimizing them as constants.

Note also that when you profile some code you are inevitably slowing it down, because of the time taken by the profiling operations themselves. This is what the overhead variable above accounts for. In order to know the exact overhead, I start with a guess, compile and disassemble, and then count the number of clock cycles spent in profiling that will be counted by the timer. Then I adjust the overhead value, compile and disassemble again, and make sure the overhead has not changed. This is also when you have to ask yourself what exactly you want to count. Here I am counting the time needed to execute call digitalWrite, but not the time needed to get the arguments in the proper registers, as I am artificially slowing this down by making them volatile.

This method is good for anything that takes more than roughly a dozen, and less that 65,536 clock cycles. Less than that, clock counting would be simpler, since you still have to clock-count the overhead. More than that, the count would overflow, and you could instead just use micros(), and live with its inherent inaccuracy.

// Set pin 13 HIGH.
PORTB |= _BV(PB5);
// The thing we want to time.
...
// Set pin 13 LOW.
PORTB &= ~_BV(PB5);

One technique I often use is to make Timer 1 count at the full CPU speed and use it to time the code I want to profile. For example:

volatile uint8_t pin = 2;
volatile uint8_t value = HIGH;
void setup()
{
 Serial.begin(9600);
 // Set Timer 1 to normal mode at F_CPU.
 TCCR1A = 0;
 TCCR1B = 1;
 // Time digitalWrite().
 cli();
 uint16_t start = TCNT1;
 digitalWrite(pin, value);
 uint16_t finish = TCNT1;
 sei();
 uint16_t overhead = 8;
 uint16_t cycles = finish - start - overhead;
 Serial.print("digitalWrite() took ");
 Serial.print(cycles);
 Serial.println(" CPU cycles.");
}
void loop(){}

Note the volatile variables used to prevent the compiler from optimizing them as constants.

One technique I often use is to make Timer 1 count at the full CPU speed and use it to time the code I want to profile. For example:

volatile uint8_t pin = 2;
volatile uint8_t value = HIGH;
void setup()
{
 Serial.begin(9600);
 // Set Timer 1 to normal mode at F_CPU.
 TCCR1A = 0;
 TCCR1B = 1;
 // Time digitalWrite().
 cli();
 uint16_t start = TCNT1;
 digitalWrite(pin, value);
 uint16_t finish = TCNT1;
 sei();
 uint16_t overhead = 8;
 uint16_t cycles = finish - start - overhead;
 Serial.print("digitalWrite() took ");
 Serial.print(cycles);
 Serial.println(" CPU cycles.");
}
void loop(){}

Note the volatile variables used to prevent the compiler from optimizing them as constants.

// Set pin 13 HIGH.
PORTB |= _BV(PB5);
// The thing we want to time.
...
// Set pin 13 LOW.
PORTB &= ~_BV(PB5);

Source Link

answered Oct 23, 2017 at 17:53

Edgar Bonet

answered Oct 23, 2017 at 17:53

Edgar Bonet

45.1k
4
42
81

One technique I often use is to make Timer 1 count at the full CPU speed and use it to time the code I want to profile. For example:

volatile uint8_t pin = 2;
volatile uint8_t value = HIGH;
void setup()
{
 Serial.begin(9600);
 // Set Timer 1 to normal mode at F_CPU.
 TCCR1A = 0;
 TCCR1B = 1;
 // Time digitalWrite().
 cli();
 uint16_t start = TCNT1;
 digitalWrite(pin, value);
 uint16_t finish = TCNT1;
 sei();
 uint16_t overhead = 8;
 uint16_t cycles = finish - start - overhead;
 Serial.print("digitalWrite() took ");
 Serial.print(cycles);
 Serial.println(" CPU cycles.");
}
void loop(){}

Note the volatile variables used to prevent the compiler from optimizing them as constants.

lang-cpp