One technique I often use is to make Timer 1 count at the full CPU speed and use it to time the code I want to profile. For example:
volatile uint8_t pin = 2;
volatile uint8_t value = HIGH;
void setup()
{
Serial.begin(9600);
// Set Timer 1 to normal mode at F_CPU.
TCCR1A = 0;
TCCR1B = 1;
// Time digitalWrite().
cli();
uint16_t start = TCNT1;
digitalWrite(pin, value);
uint16_t finish = TCNT1;
sei();
uint16_t overhead = 8;
uint16_t cycles = finish - start - overhead;
Serial.print("digitalWrite() took ");
Serial.print(cycles);
Serial.println(" CPU cycles.");
}
void loop(){}
Note the volatile
variables used to prevent the compiler from
optimizing them as constants.
Note also that when you profile some code you are inevitably slowing it
down, because of the time taken by the profiling operations themselves.
This is what the overhead
variable above accounts for. In order to
know the exact overhead, I start with a guess, compile and disassemble,
and then count the number of clock cycles spent in profiling that will
be counted by the timer. Then I adjust the overhead
value, compile and
disassemble again, and make sure the overhead has not changed. This is
also when you have to ask yourself what exactly you want to count. Here
I am counting the time needed to execute call digitalWrite
, but not
the time needed to get the arguments in the proper registers, as I am
artificially slowing this down by making them volatile
.
This method is good for anything that takes more than roughly a dozen,
and less that 65,536 clock cycles. Less than that, clock counting would
be simpler, since you still have to clock-count the overhead. More than
that, the count would overflow, and you could instead just use
micros()
, and live with its inherent inaccuracy.
Edit: If you have a scope, there is another method that is minimally
invasive on your code: toggle a pin just before and just after the thing
you want to time. E.g., assuming you have previously
pinMode(13, OUTPUT)
:
// Set pin 13 HIGH.
PORTB |= _BV(PB5);
// The thing we want to time.
...
// Set pin 13 LOW.
PORTB &= ~_BV(PB5);
This will create a pulse that you can measure on the scope. Note that using direct port access, like here, the overhead is only one CPU cycle, or 62.5 ns. Also, direct port access won't use any CPU register, so chances are the compiler will not generate less efficient code than when not including the timing part.
- 45.1k
- 4
- 42
- 81