I'm trying to read input data from the parallel port of a PC and send it to another device. For this purpose the input data is saved, when it's available and is sent if a flag is set.
The whole procedure should be done in less than 10 ms. I need to know the running time. How can I compute the CPU cycles for ReadParallel
and WriteParallel
functions (digitalRead
, digitalWrite
, bitRead
,bitWrite
) in my code or any other instructions?
Here is the code for the Arduino Uno:
#define InterruptInputPin 2 // For checking if the input data is available
#define InterruptSensorPin 3 // For checking when to send data
#define IN0 A0
#define IN1 A1
#define IN2 A2
#define IN3 A3
#define OUT0 5
#define OUT1 6
#define OUT2 7
#define OUT3 8
volatile int ParallelInput; // The input data
//*************************************
void ReadParallel() //Reads input and saves it in parallelInput.
{
if (digitalRead(IN0) == HIGH)
bitWrite(ParallelInput,0,1);
else
bitWrite(ParallelInput,0,0);
if (digitalRead(IN1) == HIGH)
bitWrite(ParallelInput,1,1);
else
bitWrite(ParallelInput,1,0);
if (digitalRead(IN2) == HIGH)
bitWrite(ParallelInput,2,1);
else
bitWrite(ParallelInput,2,0);
if (digitalRead(IN3) == HIGH)
bitWrite(ParallelInput,3,1);
else
bitWrite(ParallelInput,3,0);
}
//***************************************
void WriteParallel() //Writes ParallelInput to output.
{
detachInterrupt(digitalPinToInterrupt(InterruptInputPin));
digitalWrite(OUT0,bitRead(ParallelInput,0));
digitalWrite(OUT1,bitRead(ParallelInput,1));
digitalWrite(OUT2,bitRead(ParallelInput,2));
digitalWrite(OUT3,bitRead(ParallelInput,3));
attachInterrupt(digitalPinToInterrupt(InterruptInputPin),ReadParallel,RISING);
}
void setup() {
pinMode(IN0,INPUT);
pinMode(IN1,INPUT);
pinMode(IN2,INPUT);
pinMode(IN3,INPUT);
pinMode(OUT0,OUTPUT);
pinMode(OUT1,OUTPUT);
pinMode(OUT2,OUTPUT);
pinMode(OUT3,OUTPUT);
attachInterrupt(digitalPinToInterrupt(InterruptInputPin),ReadParallel,RISING);
attachInterrupt(digitalPinToInterrupt(InterruptSensorPin),WriteParallel,RISING);
}
void loop() {
}
3 Answers 3
Method summary
(section added on 2018年01月28日)
There are several methods available for timing code. I am adding this preliminary section to my answer in order to provide comparative data on several methods. The methods covered in the table below are those proposed as answers to both this question and a duplicate question of this. As this question has been tagged "arduino-uno", all this data assumes an AVR-based board clocked at 16 MHz.
Method comparison:
| method | resolution | max. time | typ. overhead |
|----------------|---------------|-----------------|----------------|
| `millis()` | 1 – 2 ms | 49.7 days | 0.69 – 1.3 μs |
| `micros()` | 4 μs | 71.6 min | 2.8 – 2.9 μs |
| Timer 1 | 0.0625 μs | 4.096 ms | 0.25 – 0.5 μs |
| pin toggling | scope-limited | ∞ | 0.125 μs |
| cycle counting | 0.0625 μs | boredom-limited | 0 |
| looping | N.A. | N.A. | 0.25 μs |
The methods are characterized by the following criteria:
- Resolution is the granularity of the measurement, the smaller the better.
- Maximum measurable time: any method that does timing arithmetics on the Arduino is prone to overflows if measuring too long times. Note that a timer rolling over to zero during the measurement is not a problem, as long as the period being measured is less than the rollover period.
- Typical measurement overhead: the code used to measure the timings takes itself a finite time to execute, thus one ends up measuring the execution time of the "instrumented" code, which is slightly larger than the time taken by the code one is trying to profile. This overhead should in principle be subtracted from the result, but it is often not known exactly, as it depends on how the compiler optimizes both the instrumented and the non-instrumented code.
The methods listed in the table are:
millis()
, which may be the most obvious choice, as it is so well known. Its low resolution, however, makes it ill-suited for timing code execution. It should be noted that themillis()
counter is incremented every 1024 μs. Most of the time it is incremented by 1 but, every 43 ms (roughly) it is incremented by 2 in order to avoid creeping drift. This is why its resolution is stated as "1 – 2 ms" in the table.micros()
, as proposed in ratchet freak's answer, is usually a good choice, the main caveat being the 4 μs resolution, when one could naively expect 1 μs. It also has a significant overhead.- Timer 1, which is discussed in the second part of this answer, is my favorite: it has single cycle resolution and low overhead. However, it is incompatible with other uses of the timer (PWM, Servo library...). It is also limited to measuring small delays.
- Pin toggling, as proposed by 4ilo, and by myself in the third part of this answer, is ideal if you have an oscilloscope handy. Any half-decent scope should provide single-cycle resolution. It is also minimally invasive on the code being measured and has minimal overhead.
- Cycle counting, as proposed in Majenko's answer, is arguably the "perfect" method: it is cycle-accurate, does not modify the code and has zero overhead. However, for anything beyond a handful instructions, it quickly becomes tedious. And it requires some understanding of the AVR assembly.
- Looping, as proposed in Michel Keijzers' answer, is not a measurement technique per se. It is meant to be used in conjunction with another technique in order to improve the resolution and dilute the overhead. However, lopping carries it's own overhead, which is typically 4 CPU cycles per iteration, assuming a 16-bit loop counter.
Using Timer 1
(original answer of 2017年10月23日)
One technique I often use is to make Timer 1 count at the full CPU speed and use it to time the code I want to profile. For example:
volatile uint8_t pin = 2;
volatile uint8_t value = HIGH;
void setup()
{
Serial.begin(9600);
// Set Timer 1 to normal mode at F_CPU.
TCCR1A = 0;
TCCR1B = 1;
// Time digitalWrite().
cli();
uint16_t start = TCNT1;
digitalWrite(pin, value);
uint16_t finish = TCNT1;
sei();
uint16_t overhead = 8;
uint16_t cycles = finish - start - overhead;
Serial.print("digitalWrite() took ");
Serial.print(cycles);
Serial.println(" CPU cycles.");
}
void loop(){}
Note the volatile
variables used to prevent the compiler from
optimizing them as constants.
Note also that when you profile some code you are inevitably slowing it
down, because of the time taken by the profiling operations themselves.
This is what the overhead
variable above accounts for. In order to
know the exact overhead, I start with a guess, compile and disassemble,
and then count the number of clock cycles spent in profiling that will
be counted by the timer. Then I adjust the overhead
value, compile and
disassemble again, and make sure the overhead has not changed. This is
also when you have to ask yourself what exactly you want to count. Here
I am counting the time needed to execute call digitalWrite
, but not
the time needed to get the arguments in the proper registers, as I am
artificially slowing this down by making them volatile
.
This method is good for anything that takes more than roughly a dozen,
and less that 65,536 clock cycles. Less than that, clock counting would
be simpler, since you still have to clock-count the overhead. More than
that, the count would overflow, and you could instead just use
micros()
, and live with its inherent inaccuracy.
Toggling a pin
(section added on 2017年10月23日)
If you have a scope, there is another method that is minimally
invasive on your code: toggle a pin just before and just after the thing
you want to time. E.g., assuming you have previously
pinMode(13, OUTPUT)
:
// Set pin 13 HIGH.
PORTB |= _BV(PB5);
// The thing we want to time.
...
// Set pin 13 LOW.
PORTB &= ~_BV(PB5);
This will create a pulse that you can measure on the scope. Note that using direct port access, like here, the overhead is only two CPU cycles, or 125 ns. Also, direct port access won't use any CPU register, so chances are the compiler will not generate less efficient code than when not including the timing part.
-
And without a 'scope, I/O pin timing can be done with some DMMs. If it can measure pulse frequency and duty-cycle, you can calculate the interval (puse-width). Tekpower TP4000ZC is one such DMM and an inexpensive one, at that.JRobert– JRobert2017年10月23日 21:14:30 +00:00Commented Oct 23, 2017 at 21:14
-
Thanks for answer. I haven't worked with registers and direct port access, can you send me a document or link for more details?Mehran– Mehran2017年10月23日 23:33:45 +00:00Commented Oct 23, 2017 at 23:33
-
@Mehran: The link I already sent you about direct port manipulation is a good introduction. You may then want to take a look at the description of the avr-libc macros commonly used for that task (mostly
_BV()
which is the avr-libc's equivalent of Arduino'sbit()
). Then, the ultimate reference is the microcontroller's datasheet.Edgar Bonet– Edgar Bonet2017年10月24日 08:57:56 +00:00Commented Oct 24, 2017 at 8:57
There's two methods:
- Profiling.
- Clock counting.
The first method involves recording timestamps at different points in your program and calculating the time difference between them. That's the simplest, but not always the most accurate.
The second method is far harder and involves disassembling the program after you have compiled it (or obtaining the assembly language from part way through the compilation sequence) and examining all the assembly instructions, looking them up in the instruction list, and totalling how many clock cycles are used for each instruction. That gets very complex, especially when you have loops. However it will tell you precisely how many clock cycles, and thus how long, a block of code will take to execute (not including interrupts, which always throw a big spanned in the works).
To efficiently measure the time a block of code takes, run it for like 1,000 or 1,000,000 times and divide the time by the amount of iterations.
In some cases initialization/variables can be cached but in principle the times are quite accurate. You can easily check this by doing the test for e.g. 1,000 and 2,000 times and see that the time difference also is a factor 2.
For the time counter, use an unsigned long type to accommodate more than 65,535 (ms/us whatever the unit is used).
-
1Keep in mind that managing the loop carries some overhead, typically about 4 CPU cycles per iteration for a 16-bit loop counter. You will have to subtract this from your result if you want sub-µs accuracy.Edgar Bonet– Edgar Bonet2017年11月03日 10:37:43 +00:00Commented Nov 3, 2017 at 10:37
void ReadParallel() { ParallelInput = PORTC & 0x0f; }
should compile to not much more than 6 instructions. Or even 4 if you declareParallelInput
as abyte
rather than anint
.