-
-
Notifications
You must be signed in to change notification settings - Fork 7k
rework pulseIn function to solve issue #2409 #2495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @facchinm
Thanks for the patch!
I see that there are three different asm loops (that checks the pins respectively on PORTA, PORTB and PORTC), is it possible to refactor them into a single loop and pass the PORT register as a parameter? I see that the port register is actually passed as parameter, but ignored in the asm subroutine. I'm thinking about the Mega boards that uses also PORTD and PORTE.
Another question: have you made the assembly subroutine from scratch? or have you generated them with gcc?
Hi @cmaglie
I've been struggling with the problem of making the code more generic by passing the port register as a parameter (as you saw in the function signature) for many days but no solution was enough portable (all involved Z register). Also, GCC refuses to compile the line if the in
opcode's second parameter was not static.
Since the common use case does not involve changing the port, the additional loops could be purged by the compiler (I'm thinking about the dead-strip
flag on ARM architecture but I don't know if it applies also to newer avr-gcc)
The routine(s) are all handwritten and the introduction of in
opcode gives a good granularity boost (2x) over the previous version, so I thought that the (maybe removable) increased code size could be an acceptable downside.
I also noticed some typos in the ASM section looking back at the code, so I'll make an updated pull request including PORTE handling and these fixes
Uhmm... are you sure that the asm part is keeping the original behaviour?
I can see at least two discrepancies in your implementation:
-
width
is an unsigned long (== uint32_t) but you are using 16 bit operations in the asm loops. -
the
timeout
in the original pulseIn is intended for the wholepulseIn
operation (including the time spent waiting for the start pulse) instead you seems to discard the elapsed time because you don't pass the variablenumloops
to the asm subroutine.
This from a raw analysis of the patch. I guess that the asm part, being handwritten from scratch, should be carefully checked.
Instead, what do you think about generating the asm with gcc and brutally copying it into a .s file? or even better as inline __asm__
but I guess that this requires much more work.
Hi @cmaglie, you are right on (almost :) ) everything so the expanded patcheset includes fixes for:
-
width as uint32_t -> I only tested the routine with short pulses ( > 100 HZ - my fault) and it was broken with for pulses < 80Hz (fixed in adca790)
-
the timeout now works as intended (patch a72e1b7), passing to the asm function only the "remaining" timeout
-
additional ports support (for ATMega 2560 powered boards) added in ca18193 (untested for lack of hardware)
Now the philosophical questions:
-
__inline asm
is not guaranteed to leave everything as-is (I tried it at first and it gave very different results on gcc 4.3 and 4.9) -
copy-paste gcc-generated asm code would lead (in my opinion) to unreadable and non understandable code hanging there forever... The call does not involve 4 instructions that can be left there as black working magic but if far more complicated and writing it by hand with good comments can leave space to other people for enhancements.
At the end, I macroified the ASM call (patch f06cb4f) to be maintainable without going crazy
If someone could test it on ArduinoMega it would be great!
Hi @facchinm
I tried to compile the following sketch for a Uno with your patch:
void setup() { pulseIn(9, HIGH, 10000L); } void loop() { }
There is an increase in size from 832 to 1142 (+310 bytes), and this is not acceptable. I suspect that the big switch/case force the linker to keep all the instances of the macro.
I beg you to give the compiler a go, you may be surprised by the quality of produced code.
I usually use the following command line options to generate the asm listing:
-gstabs -Wa,-ahlmsd=output.lst -dp -fverbose-asm
To avoid the need to define macros we can obtain the port address once and use it into the compiled function using a volatile register definition like:
unsigned long ASM_busyLoop(uint8_t bit, volatile uint8_t *port, uint8_t stateMask, unsigned long maxloops) { ... while ((*port & bit) == stateMask) ... }
and call it from pulseIn(..) with:
unsigned long width = busyLoop(bit, portInputRegister(port), stateMask, maxloops);
doing this way the compiler will take care of all the boring portInputRegister(...)
stuff.
Finally, I'd like also to have all the three loops as asm code not only the last one.
Hi @cmaglie,
I was aware that the increased size could be an issue, so I patched the routine to pass the pointer to the port as a parameter (thank you for the tip, really) and the main problem should be solved (150d430)
By the way, I also tried to produce a readable generated asm listing using your command line options and the result was great (apparently) but not enough optimized.
In fact, the C routine
unsigned long pulseInSimpl(volatile uint8_t *port, uint8_t bit, uint8_t stateMask, unsigned long maxloops); { unsigned long width = 0; // wait for any previous pulse to end while ((*port & bit) == stateMask) if (--maxloops == 0) return 0; // wait for the pulse to start while ((*port & bit) != stateMask) if (--maxloops == 0) return 0; // wait for the pulse to stop while ((*port & bit) == stateMask) { if (++width == maxloops) return 0; } return width; }
performed very well with GCC 4.8, using only 16 instructions per cycle, while the readable ASM listing takes more than 48. Of course the objdumped
object file also takes 16 instructions but is completely unreadable. So I decided to give another chance to the handwritten asm (which takes 17).
2- I understand your request to rewrite the setup loops in assembly, because of course also the timeout calculation is compiler dependent.
As they are not critical code I decided to let them as they are for the moment, only changing the maxloops
calculation divider to be something more realistic.
3- I noticed that the new git HEAD contains another solution. As this pull request has gone way too long I'd like to simplify the history and rebase it upon the current head. Then I'd open another pull request and close this one. Is it ok?
19e108a
to
c437761
Compare
Fixed as requested with GCC generated asm code including all three loops
@ArduinoBot build this please
Build successful. Please test this code using one of the following:
http://arduino.cc/download.php?f=/javaide/pull_requests/arduino-PR-2495-BUILD-207-linux32.tar.xz
http://arduino.cc/download.php?f=/javaide/pull_requests/arduino-PR-2495-BUILD-207-linux64.tar.xz
http://arduino.cc/download.php?f=/javaide/pull_requests/arduino-PR-2495-BUILD-207-windows.zip
http://arduino.cc/download.php?f=/javaide/pull_requests/arduino-PR-2495-BUILD-207-macosx.zip
http://arduino.cc/download.php?f=/javaide/pull_requests/arduino-PR-2495-BUILD-207-macosx-java-latest.zip
This reverts commit 8ddc519. To be substituted by ASM generated code
this assembly code was generated by avr-gcc 4.8.3
pulseInLong is suitable for long pulses in interrupt context
return 0 if timeout has been reached
41b9e05
to
a7d81d0
Compare
rework pulseIn function to solve issue #2409
This patchset reworks pulseIn library function to get a fixed behavior, untouched by GCC/libc optimizations.
It also adds an helper function (pulseInLong) that uses a millis() implementation to handle in a better way IRQ affected and long pulse scenarios.
Tested against currently provided GCC and 4.9.1 version on all available pins with an external pulse generator.
Comments/enhancements (especially in the ASM part) are very welcome.