-
-
Notifications
You must be signed in to change notification settings - Fork 7k
Support extra parameter on attachInterrupt()
#4519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
A very simple example of how this can be used to writing components:
class PinListener {
int pin;
static void isr(void* arg) {
PinListener* listener = (PinListener*)arg;
listener->onInterrupt();
}
public:
PinListener(int pin) {
this->pin = pin;
pinMode(pin, INPUT_PULLUP);
attachInterruptParam(digitalPinToInterrupt(pin), isr, CHANGE, this);
}
void onInterrupt() {
Serial.print("Pin ");
Serial.print(pin);
Serial.print(" changed to ");
Serial.println(digitalRead(pin));
}
};
PinListener listeners[] = {
PinListener(0),
PinListener(1),
PinListener(2),
PinListener(7)
};
void setup() {
Serial.begin(9600);
}
void loop() {
}
Doesnt this break the current API with no extra param? I am not sure if the code works in this case. Have you tested it?
Also this adds a lot of overhead, even though it could be very useable. Not sure about that.
However I like the compact definitions in WInterrupts.c
.
If you ask me I'd rethink the whole API in general. I consumes quite some flash, even though only a singe PinInterrupt is used.
And I'd also change those:
#define CHANGE 1
#define FALLING 2
#define RISING 3
to
#define FALLING 1
#define RISING 2
#define CHANGE 3
So you have bit0 indicate falling, bit1 rising and both together a change.
The code looks good to me. The compatibility wrapper is a clever way to still support the existing parameterless version (@NicoHood, I think that still works), though it might be better for performance to not have the extra function call. One could instead store void*
for the function pointers and cast them to the right type when calling, which I presume would be more efficient (perhaps smaller as well). The challenge then is how to decide whether to pass a parameter. Using NULL as a marker would work, but mean that you can never pass NULL, which isn't so great. Another option would be to use the LSB of the function pointer (since that is always 0 due to alignment), but that might be fragile as well. Perhaps storing a bitmask (which would only take 1 byte with the current boards) is ok, though.
The introduction of IMPLEMENT_ISR
looks good to me, but ideally that would be in a separate commit, since it is totally unrelated to the main change in this PR.
Has anyone tested the performance impact?
Thanks for the feedback.
I'll split the IMPLEMENT_ISR
macro into a separate PR.
@PaulStoffregen: How can I measure the performance impact?
@matthijskooijman, what do you think of using a poison value on voidFuncPtrParam (Maybe nothing
) to mean "parameterless funcion".
#define IMPLEMENT_ISR(vect, interrupt) \
ISR(vect) { \
if (intFuncParam [interrupt] == nothing) {
((voidFuncPtr)intFunc[interrupt])();
} else {
((voidFuncPtrParam)intFunc[interrupt])(intFuncParam[interrupt]);
}
}
The tradeoff is that ISR functions will take a little more flash to handle the condition inline
@NicoHood, Although I agree the current implementation wastes some flash with unused interrupts, I really don't know how to rewrite it in a more efficient way. I'm taking suggestions :D
compatFunc
maintains compatibility with current parameterless API. I've tested it and it works.
About the constant values, I agree with you, that makes more sense, but looks like the values are defined by the hardware
@matthijskooijman, what do you think of using a poison value on voidFuncPtrParam (Maybe nothing) to mean "parameterless funcion".
That sounds reasonable, checking a pointer is probably a lot faster than calling a function. There is still overhead, of course, from loading the parameter and checking it.
As for measuring the overhead - disassembling and counting cycles would work, or doing some kind of runtime timekeeping (though that's more tricky to get accurate).
@matthijskooijman, what do you think of using a poison value on voidFuncPtrParam (Maybe nothing) to mean "parameterless funcion".
Wait, that's what I proposed with letting "NULL" to mean "no parameter". However, this will remove one value from the available parameter values, which I think is not a good plan (there is a fair chance that a user will pass that value anyway, especially if the void* argument is used to pass integers instead of pointers, and because function and data pointers can overlap).
What I thought you were saying, was to use a poisoned as the function pointer value, which means that the parameter contains the actual (parameterless) function pointer to call. This is more reliable because valid values for the function pointer array are only pointers into the flash, and never pointers into RAM or plain integers.
You are right.
Something like this?
if (intFunc [interrupt] == compatFunc) {
((voidFuncPtr)intFuncParam[interrupt])();
} else {
intFunc[interrupt])(intFuncParam[interrupt]);
}
Yup, that's what I had in mind indeed. You should probably get intFunc[interrupt]
once and put it in a variable, though, since intFunc
is volatile so the compiler likely won't optimize it otherwise. As for the compatFunc
(which probably needs some better name), ideally it would be a function containing of just a "ret" instruction, so it only takes up 2 bytes. There might be some gcc attributes to achieve that (to prevent any pre- and postamble from being generated, possibly by making it a naked ISR).
The current code is very verbose and a painful to maintain (Change ISR implementation in 20 different places? No Thanks!). (This was originally part of arduino#4519, but we all agreeded it deserved it's own PR)
I've made a separate PR for the IMPLEMENT_ISR
macro on #4521
I haven't compared the performance yet, but the memory use on a pro micro (atmega32u4) is
- using
compatFunc
: +66 bytes on flash, +10 bytes of ram - using @matthijskooijman's suggestion: +132 bytes on flash, +10 bytes of ram
Current code:
#define IMPLEMENT_ISR(vect, interrupt) \
ISR(vect) { \
intFunc[interrupt](); \
}
ISR: Executes 3 instructions (7 cycles) from code + 34 instructions of boilerplate
..... Here goes prologue boilerplace: 17 instructions (Mostly `push`)
1e8: e0 91 00 01 lds r30, 0x0100
1ec: f0 91 01 01 lds r31, 0x0101
1f0: 09 95 icall
..... Here goes epilogue boilerplace: 17 instructions (Mostly `pop`)
My code, using compatFunc
:
static void compatFunc(void* realCallback) {
voidFuncPtr f = realCallback;
f();
}
#define IMPLEMENT_ISR(vect, interrupt) \
ISR(vect) { \
intFunc[interrupt](intFuncParam [interrupt]); \
}
ISR: Executes 5 instructions (11 cycles) from code + 34 instructions of boilerplate
..... Here goes prologue boilerplace: 17 instructions (Mostly `push`)
202: e0 91 00 01 lds r30, 0x0100
206: f0 91 01 01 lds r31, 0x0101
20a: 80 91 32 01 lds r24, 0x0132
20e: 90 91 33 01 lds r25, 0x0133
212: 09 95 icall
..... Here goes epilogue boilerplace: 17 instructions (Mostly `pop`)
compatFunc: Executes 2 instructions (3 cycles)
186: fc 01 movw r30, r24
188: 09 94 ijmp
Summary:
- Flash: +66 bytes
- RAM: +10 bytes
- ISR execution: +2 instructions and +4 cycles (with parameters), +4 instructions and +7 cycles (without parameters)
Suggested code, using if
inside the ISR:
#define IMPLEMENT_ISR(vect, interrupt) \
ISR(vect) { \
voidFuncPtrParam cb = intFunc[interrupt]; \
void* param = intFuncParam [interrupt]; \
if (cb == NOARG_CALLBACK) { \
((voidFuncPtr)param)(); \
} else { \
cb(param); \
} \
}
ISR: Executes 9-11 (16-18 cycles) out of 12 instructions from code + 34 instructions of boilerplate
(note that the brne is 2 cycles when branching and 1 cycle when not)
..... Here goes prologue boilerplace: 17 instructions (Mostly `push`)
1fe: e0 91 00 01 lds r30, 0x0100
202: f0 91 01 01 lds r31, 0x0101
206: 80 91 32 01 lds r24, 0x0132
20a: 90 91 33 01 lds r25, 0x0133
20e: ef 3f cpi r30, 0xFF ; 255
210: 2f ef ldi r18, 0xFF ; 255
212: f2 07 cpc r31, r18
214: 19 f4 brne .+6 ; 0x21c <__vector_1+0x40>
216: fc 01 movw r30, r24
218: 09 95 icall
21a: 01 c0 rjmp .+2 ; 0x21e <__vector_1+0x42>
21c: 09 95 icall
..... Here goes epilogue boilerplace: 17 instructions (Mostly `pop`)
Summary:
- Flash: +132 bytes
- RAM: +10 bytes
- ISR execution: +4 instructions and +9 cycles (with parameters), +6 instructions and +11 cycles (without parameters)
Could be optimized to +4 instructions and +9 cycles (with parameters), +5 instructions and +9 cycles (without parameters)
91a68d9
to
847bc9e
Compare
@paulo-raca, thanks for the thorough comparison :-D
I'm actually quite surprised about the mess the compiler makes when compiling my suggestion. There's a lot of overhead (12 instructions from my count) from storing and reloading variables on the stack, when they could just as well be kept around in registers (AFAICS all of the call-clobbered registers are pushed by the boilerplate, so they should be available). I wonder if it would be worth implementing this in hand-coded assembly perhaps...
Also interesting is that compatfunction has something similar going on, which is even more silly since it stores r24 and r25 on the stack and reloads them immediately to the same registers twice. It's almost as if you compiled without optimizations enabled?
You are right, I completely forgot to enable optimizations :/
I'll update it later
(I hope most of the ISR boilerplate goes away with it)
@paulo-raca, that suggests you compiled with avr-gcc manually? For a fair comparison, it would seem best to actually use the Arduino IDE for compilation, so you get the "official" compiler flags and compilation process. To extract the .elf file from the IDE for disassembly, just look through the verbose compilation output to find where it is stored.
@matthijskooijman, I updated the previous comment with optimized disassembly generated from the IDE.
Turns out that using compatFunction
is extremely efficient and gives only +4 instructions overhead compared with the current implementation (2 instructions if you are using the callback with parameter)
I looked up the number of cycles in addition to the number of instructions, and I took the liberty of adding them in your comment. I believe the last code snippet could be optimized (by hand-coding in assembly, or perhaps by making the compiler smarter) to remove one icall
and rjmp
instruction, saving four bytes of flash and two cycles, but it' still slower than the compatFunc
approach.
To me the compatFunc
approach seems decent, the compiler seems smart enough to make it really small without any boilerplate. I tend to think about calls, and especially indirect ones, as expensive, but that seems to mostly true on x86 (which does heavy pipelining and speculative execution). On the AVR, an indirect calll is just three cycles (even a cycle faster than a direct call, interestingly enough).
So, your original approach seems the most efficient. Any users that do not need this new parameter feature will pay an additional 7 cycles. The current latency is 17*2+7 (assuming all 17 preamble boilerplate instructions are push at 2 cycles per instruction) 41 cycles. This new feature brings a measureable, but not enormous, cost.
On additional option we could consider is to simply pass the parameter, whether it is valid or not. Knowing AVR's calling conventions, we know that passing a parameter will simply occupy two registers. If the called function does not expect these parameters, it will just ignore them, without any additional problems. I don't see any way to actually verify this during compilation (e.g. by writing things in such a way that the compiler will see this optimization opportunity and exploit it, and/or an error be thrown if the compiler at some point handles calling things differently), but that does not need to be a problem necessarily (a decent comment should be written, though, to ensure this does not break when ported to other platforms).
If you would just pass the parameter, the additional overhead is two lds
instructions, or 4 cycles for both cases, which seems reasonable to me.
I like the idea!
847bc9e
to
c926025
Compare
I've implemented it, and everything is looking good.
Thanks you for help, @matthijskooijman
Is there anything else I can do about this?
Can we have it merged?
As soon as it gets settled, I plan to do the same on Due
Hi @paulo-raca ,
I merged #4521, would you mind rebasing this one against current master
? 😄
c926025
to
c26f0ca
Compare
Rebased. Thanks!
It's been a while... Should we merge or discard?
tni
commented
May 5, 2017
On ARM Cortex M4, invoking that callback is 5 vs. 4 instructions.
The void* parameter would be passed in a register, so a function(void) could be called as function(void*).
\\
The void* parameter works fine with lamdas, e.g.:
class S {
public:
void callback();
};
void registerCallback(void (*callback_fn)(void*), void* context) {
// ...
}
void test(S* s) {
// ...
registerCallback([](void* ctx){ ((S*) ctx)->callback(); }, s);
}
I planned to follow up this patch with an ARM versions, but seems like it's not going anywhere :(
Anyway, I find that 5 instructions is still a pretty low overhead for what (I think) is a pretty useful feature.
Not sure about the lamdas, but I think it probably won't support lamdas with a capture list -- Just guessing, can you try it?
tni
commented
May 5, 2017
No, you can't use lambda captures. They need storage that a function pointer doesn't have (you would need to use std::function). Lambdas without captures decay to function pointers.
The Cortex M4 overhead is 1 extra instruction. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sort of dropped off my radar, but a post on the mailing list got my attention again.
I just had another look at the code, and it looks good and acceptable for mergingm to me. @cmaglie @facchinm, would you have a look?
@paulo-raca, could you perhaps update your comment with assembly comparison with the assembly for the current approach?
aaronw2
commented
Aug 29, 2017
This would be extremely useful for me as well. It works well with C++ since I can just pass the this pointer as a parameter.
I didn't know about this thread. In my implementation for the ESP32 I also pass the interrupt (pin) number to the handler, though with a bit more overhead I could live without that. It just means that I need to extract the pin number from my class instead.
When writing an arduino library, it is difficult to connect interrupt handlers with the component instance that should be notified. In C, the common pattern to fix this is to specifying a `void*` parameter with the callback, where the listener can store any context necessary. This patch adds the new function `attachInterruptParam()` to implement this pattern.
@matthijskooijman, the current approach adds 2 instructions / 4 cycles on AVR:
Before:
d78: e0 91 14 01 lds r30, 0x0114 ; 0x800114 <intFunc+0x8>
d7c: f0 91 15 01 lds r31, 0x0115 ; 0x800115 <intFunc+0x9>
d80: 09 95 icall
After:
d78: e0 91 14 01 lds r30, 0x0114 ; 0x800114 <intFunc+0x8>
d7c: f0 91 15 01 lds r31, 0x0115 ; 0x800115 <intFunc+0x9>
d80: 80 91 4a 01 lds r24, 0x014A ; 0x80014a <intFuncParam+0x8>
d84: 90 91 4b 01 lds r25, 0x014B ; 0x80014b <intFuncParam+0x9>
d88: 09 95 icall
On an Arduino Pro Micro (Atmega32U4) it adds 56 bytes of Flash and 10 bytes of RAM
c26f0ca
to
9eb8ba0
Compare
@aaronw2, thanks for bringing it up again :)
When it gets merged I'll also make patches for ESP8266, ESP32 and Due
@paulo-raca
Is the "non-portable" trick applicable also to ARM?
Do you have a reference implementation to compare the performance also there?
I haven't tried, but I believe it does.
According to @tni, the overhead on arm is only 1 instruction.
Also, even if ARM needs a bit more instructions, that's probably less of a problem given the usually faster MCUs. And since this is just an implementation detail, that does not affect the API, this should not hinder the changes on AVR.
I haven't tried, but I believe it does.
Ok, may you explain why calling a function with extra parameters is safe in the AVR case?
I'd like to have some clarification I'm not so easy to merge something that really looks like something you should not do :-)
Also, even if ARM needs a bit more instructions, that's probably less of a problem given the usually faster MCUs.
The point is how much is this "bit more" on the ARM side, so I'm going to give it a try to check it out.
Ok, may you explain why calling a function with extra parameters is safe in the AVR case?
On AVR, the extra arguments are sent on registers r24
and r25
. If the function doesn't expect these arguments, these values on these registers are simply be ignored.
The big red warning mostly applies to ABIs where arguments are stored on the stack.
E.g., If the ISR pushes 2 arguments on the stack but the callback only pops one when returning, everything breaks.
In C, the common pattern to fix this is to specifying a void* parameter with the callback, where the listener can store any context necessary. This patch adds the new function attachInterruptParam() to implement this pattern. This is a port of arduino/Arduino#4519, but for Arduino Due.
I've just written an equivalent PR for Arduino Due.
The same trick also worked on ARM, and the ISR overhead is only 2 instructions.
The big red warning mostly applies to ABIs where arguments are stored on the stack.
E.g., If the ISR pushes 2 arguments on the stack but the callback only pops one when returning,
everything breaks.
Is this really true? I believe the caller has to clean up the stack or else one could not realise functions with variable count of parameters like
int printf(char* format, ...);
where the called function cannot know the number of arguments passed to it.
At least that's how it's done in C on x86.
@dplasa, this is true for variadic functions, but the compiler might use a different calling convention on them.
E.g., on AVR:
void foo(char a, char b) {}
void bar(char a, ...) {}
void callFoo() {
foo(1,2);
}
void callBar() {
bar(1,2);
}
compiles to
[...]
void callFoo() {
[...]
foo(1,2);
2e: 62 e0 ldi r22, 0x02 ; 2
30: 81 e0 ldi r24, 0x01 ; 1
32: e6 df rcall .-52 ; 0x0 <foo>
}
[...]
void callBar() {
[...]
bar(1,2);
42: 1f 92 push r1
44: 82 e0 ldi r24, 0x02 ; 2
46: 8f 93 push r24
48: 81 e0 ldi r24, 0x01 ; 1
4a: 8f 93 push r24
4c: e5 df rcall .-54 ; 0x18 <bar>
4e: 0f 90 pop r0
50: 0f 90 pop r0
52: 0f 90 pop r0
}
[...]
On x86 there is a number of calling conventions where the callee should clean up the task: https://en.wikipedia.org/wiki/X86_calling_conventions#Callee_clean-up
dplasa
commented
Oct 18, 2017
Still my argument holds :)
- If the caller passes arguments on the stack - it cleans them up after the call (void callBar()...)
- If the caller passes arguments in registers (void callFoo()) nothing needs to be cleaned
If an void ISR() without argument was called the compiler would probably just rcall the ISR code when the interrupt occurs.
If we change this to ISR(void* arg) then arg is either passed by register which is no deal to an ISR(void) function which just ignores the register...
If the compiler would generate code such that arg is passed over the stack, the caller will also clean up the stack - and again the ISR(void) just ignores the bit on the stack.
Hello,
Since the AVR core has been moved into a separate sepository, I created a new PR with the same changes: arduino/ArduinoCore-avr#58
There is also a version for SAM: arduino/ArduinoCore-sam#44
If these are merged in the official API, I'll also port this feature for ESP8266 / ESP32
aaronw2
commented
Feb 1, 2019
I'm not aware of any Arduino compatible core that would have a problem with this. In every ABI I see supported, the first argument is always passed in a register.
I'm not aware either, but it's better to make it very explicit, since it is an abuse of the ABI
aaronw2
commented
Feb 6, 2019
I might also add that in C, it is always the caller's responsibility to clean up the stack from any parameters that are passed. Passing an extra parameter will not cause a function call to fail. If passed on a stack, parameters are passed in a right to left order so that extra parameters on the stack will not cause a function to fail either. If the Pascal calling convention were used instead of C then this would be an issue since it is up to the callee to clean up the stack, but since this is C it is up to the caller to clean it up.
Ok, so what if I replace the comment with this?
// To support callbacks with and without parameters with minimum overhead,
// this relies on that fact that in C calling conventions extra argument on a
// function call are safely ignored without side-effects.
aaronw2
commented
Feb 6, 2019
That sounds good to me.
When writing an arduino library, it is difficult to connect interrupt handlers with the component instance that should be notified.
In C, the common pattern to fix this is to specifying a
void*
parameter with the callback, where the listener can store any context necessary.This patch adds the new function
attachInterruptParam()
to implement this pattern.