-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Fix #9194 - Allow more control over crash dump #9196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
custom_panic_callback
Probably just one callback would be ok? We do have custom 'USER' reasons which are private now, but obviously can be moved to a public header. See rst_reason_sw
. Suppose, this can be extended with something like esp_panic_info()
returning a struct of those pointers?
NO_CUT_HERE
Any reason not to fully hide the whole internal printf with a similar flag? Or, at least hide the part of postmortem & funcs calling postmortem that print stuff, structuring it a bit different in the main func especially.
(or simply replacing such internal printf with an empty macro on demand?)
rst_info.exccause used instead of local exccause for the 6 exception (divide by zero): unless I'm missing something, this allows the custom_crash_handler at the end of the postmortem function to also receive a correct exception number. Useful for projects such as EspCrashSave and EspCrashSaveSpiffs
see #7496
This helps out exception decoder by pretending the hw does have instructions that would've triggered the exception. If only to be consistent with the output... I think the only issue is exc code would be different between rst info struct kept in memory between reboots and the one passed to the callback?
Thank you for the feedback! I agree with your points. Let me elaborate a bit on the original intentions behind these changes while also addressing your suggestions.
custom_panic_callback
The idea behind introducing a custom panic callback was to provide users with more flexibility in handling panics, especially for scenarios where recording the crash information in flash memory is necessary. I agree that consolidating this into a single callback would be best as it simplifies the interface and reduces the addition of unnecessary callbacks.
I suppose the reason for adding this to the PR was to avoid modification of the existing custom_crash_callback
's signature.
Probably just one callback would be ok? ... Suppose, this can be extended with something like
esp_panic_info()
returning a struct of those pointers
That's a great idea! I very much prefer to use something like esp_panic_info()
instead of what I came with (custom_panic_callback
).
Moving the rst_reason_sw
to a public header and extending it with something like an esp_panic_info()
function that returns a structured set of pointers would be an excellent approach. Do you have any specific ideas on what that struct should contain, or how the implementation might be extended best?
NO_CUT_HERE
The purpose of the NO_CUT_HERE
macro was to provide users with an option to minimize the output, especially when the crash dump is being transmitted over limited bandwidth or recorded in constrained storage environments. I can see the benefit of hiding more of the internal printf
calls, particularly those within the postmortem functions.
Any reason not to fully hide the whole internal printf with a similar flag, ... or simply replacing such internal printf with an empty macro on demand?
The second idea is exactly what I tried to pursue by introducing the new crash_ets_uart_putc1()
func. Since it can be overridden by the user, it can also be used to point to a no-op function, or to simply not print anything to the serial. I also agree with using a macro instead, however I have found that the linkage approach works better. Alternatively, replacing the higher-level internal printf
with a macro as needed could offer a more straightforward solution.
As to why the cut_here()
function specifically, it's currently the only part that produces static content, and can be forgoed when saving or transmitting the crash dump. I just needed a way to signal to prevent it from being generated. I agree this is not the best approach and leads to clutter in code, but it's currently the best I could come up with 😅 I would appreciate other solutions as well!
Structuring the main function differently to conditionally include or exclude these outputs is also a good approach. Would you prefer the more granular control of individual printf
calls, or a broader flag that suppresses all related output?
rst_info.exccause
Your point about using rst_info.exccause
for the divide by zero exception is well taken. The intention here was to ensure consistency across the crash reports, especially for tools like EspCrashSave and EspCrashSaveSpiffs that rely on accurate exception information.
I think the only issue is exc code would be different between rst info struct kept in memory between reboots and the one passed to the callback
As for my personal opinion, I can accept that the consequent exception code wouldn't be the correct one (6), that is due to ESP8266's hardware design. The more important factor is that during the crash instance, the code is correctly reported (as is the current case when considering print to serial), and the code is also correctly saved, so that further in the line it can also be correctly reported over the network, or something else.
However, you raise a valid concern about the discrepancy between the rst_info
struct retained between reboots and the one passed to the callback. This could potentially lead to confusion in diagnosing issues.
Currently, the exception is correctly only displayed when a crash happens. We don't care about the next reboot, because we have lost the context of the actual crash. This PR aims to also correctly save the exception code. This means that later, when we access the crash information, we will also see the correct code. This can either be another print to the uart (triggered by user), or in the case I'm interested, transmit it over the network on the next reboot. Which means the code that is available in next reboots will be ignored in favor of the saved one, received from the crash callback.
I'm open to suggestions on how to handle this more effectively. Sadly, it appears that getting the same exception code across both instances or providing additional context in the callback to clarify the source of the exception code are both unlikely to implement.
Once again, I appreciate your insights and would love to hear any further thoughts you have on these topics. Your expertise in this area is invaluable, and I'm keen to ensure that the final implementation meets both the flexibility and reliability requirements of our users.
ref. esp8266#9193, esp8266#9194, esp8266#9196 - persistent & public REASON_USER_... for user callback rst_info reason - public postmortem callback functions, don't overly hide the implementation - separate reasons for panic, assert, abort and c++ exceptions - put postmortem func one level down in the function chain, allow to override both the function itself or individual internals - stack offset for -fexceptions, skip unwind data to the actual point of origin
Uh oh!
There was an error while loading. Please reload this page.
@mcspr This is a proof of concept PR for an alternative solution to #9193 and also closes #9194. Could you please verify? I would appreciate any input and feedback. Thanks!
Description
This PR adds more granular control over how the crash dump is delivered, and some more features:
custom_panic_callback(panic_file, panic_line, panic_func, panic_what)
: Allows the user to additionally listen for panics (useful to record them in flash memory)ets_uart_putc1()
→crash_ets_uart_putc1()
: renamed function that prints to the Serial port. This allows the function to be overridden, possibly applying transformations on the text (useful if transmitting over a Bus-like UART, such as RS-485), remove clutter or insert additional comments (such as uptime), or completely inhibit the dump (useful is uart is connected to a peripheral device)NO_CUT_HERE
: Introduce a macro that allows disabling the "CUT HERE" message and the divider lines. Can be used to minify the number of transmitted/recorded characters over the wire or storage. Can be used with the-D
flag.ets_println()
instead ofets_putc('\n')
: This makes the new lines use the "cooked" interface instead of the "raw" one, which is also used on the staticets_printf_P()
function, which allows:printf_P(...)
crash_ets_uart_putc1()
is overridenrst_info.exccause
used instead of localexccause
for the6
exception (divide by zero): unless I'm missing something, this allows the custom_crash_handler at the end of the postmortem function to also receive a correct exception number. Useful for projects such as EspCrashSave and EspCrashSaveSpiffsAdditional notes
The naming of some functions might not be the optimal value, if you would like to suggest a better one please either modify this PR or submit a comment.