Fix #9194 - Allow more control over crash dump #9196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

DRSDavidSoft wants to merge 4 commits into esp8266:master

from DRSDavidSoft:pr-postmortem-extension

Open

Fix #9194 - Allow more control over crash dump #9196

DRSDavidSoft wants to merge 4 commits into esp8266:master from DRSDavidSoft:pr-postmortem-extension

Conversation

DRSDavidSoft

Copy link

Contributor

@DRSDavidSoft DRSDavidSoft commented Aug 30, 2024 •

edited

Loading

@mcspr This is a proof of concept PR for an alternative solution to #9193 and also closes #9194. Could you please verify? I would appreciate any input and feedback. Thanks!

Description

This PR adds more granular control over how the crash dump is delivered, and some more features:

custom_panic_callback(panic_file, panic_line, panic_func, panic_what): Allows the user to additionally listen for panics (useful to record them in flash memory)
ets_uart_putc1() → crash_ets_uart_putc1(): renamed function that prints to the Serial port. This allows the function to be overridden, possibly applying transformations on the text (useful if transmitting over a Bus-like UART, such as RS-485), remove clutter or insert additional comments (such as uptime), or completely inhibit the dump (useful is uart is connected to a peripheral device)
NO_CUT_HERE: Introduce a macro that allows disabling the "CUT HERE" message and the divider lines. Can be used to minify the number of transmitted/recorded characters over the wire or storage. Can be used with the -D flag.
ets_println() instead of ets_putc('\n'): This makes the new lines use the "cooked" interface instead of the "raw" one, which is also used on the static ets_printf_P() function, which allows:
- uniform line endings (always "\r\n", instead of just the ones used within printf_P(...)
- making the line endings also silent when the crash_ets_uart_putc1() is overriden
rst_info.exccause used instead of local exccause for the 6 exception (divide by zero): unless I'm missing something, this allows the custom_crash_handler at the end of the postmortem function to also receive a correct exception number. Useful for projects such as EspCrashSave and EspCrashSaveSpiffs

Additional notes

The naming of some functions might not be the optimal value, if you would like to suggest a better one please either modify this PR or submit a comment.

DRSDavidSoft added 3 commits

August 30, 2024 03:26

@DRSDavidSoft


 expose more crash reporting apis

6765e0d

@DRSDavidSoft


 change tab to spaces

6b3e716

@DRSDavidSoft


 rename symbols to correct ones

2716f3c

@mcspr

Copy link

Collaborator

mcspr commented Aug 30, 2024

custom_panic_callback

Probably just one callback would be ok? We do have custom 'USER' reasons which are private now, but obviously can be moved to a public header. See rst_reason_sw. Suppose, this can be extended with something like esp_panic_info() returning a struct of those pointers?

NO_CUT_HERE

Any reason not to fully hide the whole internal printf with a similar flag? Or, at least hide the part of postmortem & funcs calling postmortem that print stuff, structuring it a bit different in the main func especially.
(or simply replacing such internal printf with an empty macro on demand?)

rst_info.exccause used instead of local exccause for the 6 exception (divide by zero): unless I'm missing something, this allows the custom_crash_handler at the end of the postmortem function to also receive a correct exception number. Useful for projects such as EspCrashSave and EspCrashSaveSpiffs

see #7496
This helps out exception decoder by pretending the hw does have instructions that would've triggered the exception. If only to be consistent with the output... I think the only issue is exc code would be different between rst info struct kept in memory between reboots and the one passed to the callback?

@DRSDavidSoft

Copy link

Contributor Author

DRSDavidSoft commented Aug 30, 2024

Thank you for the feedback! I agree with your points. Let me elaborate a bit on the original intentions behind these changes while also addressing your suggestions.

`custom_panic_callback`

The idea behind introducing a custom panic callback was to provide users with more flexibility in handling panics, especially for scenarios where recording the crash information in flash memory is necessary. I agree that consolidating this into a single callback would be best as it simplifies the interface and reduces the addition of unnecessary callbacks.

I suppose the reason for adding this to the PR was to avoid modification of the existing custom_crash_callback's signature.

Probably just one callback would be ok? ... Suppose, this can be extended with something like esp_panic_info() returning a struct of those pointers

That's a great idea! I very much prefer to use something like esp_panic_info() instead of what I came with (custom_panic_callback).

Moving the rst_reason_sw to a public header and extending it with something like an esp_panic_info() function that returns a structured set of pointers would be an excellent approach. Do you have any specific ideas on what that struct should contain, or how the implementation might be extended best?

`NO_CUT_HERE`

The purpose of the NO_CUT_HERE macro was to provide users with an option to minimize the output, especially when the crash dump is being transmitted over limited bandwidth or recorded in constrained storage environments. I can see the benefit of hiding more of the internal printf calls, particularly those within the postmortem functions.

Any reason not to fully hide the whole internal printf with a similar flag, ... or simply replacing such internal printf with an empty macro on demand?

The second idea is exactly what I tried to pursue by introducing the new crash_ets_uart_putc1() func. Since it can be overridden by the user, it can also be used to point to a no-op function, or to simply not print anything to the serial. I also agree with using a macro instead, however I have found that the linkage approach works better. Alternatively, replacing the higher-level internal printf with a macro as needed could offer a more straightforward solution.

As to why the cut_here() function specifically, it's currently the only part that produces static content, and can be forgoed when saving or transmitting the crash dump. I just needed a way to signal to prevent it from being generated. I agree this is not the best approach and leads to clutter in code, but it's currently the best I could come up with 😅 I would appreciate other solutions as well!

Structuring the main function differently to conditionally include or exclude these outputs is also a good approach. Would you prefer the more granular control of individual printf calls, or a broader flag that suppresses all related output?

`rst_info.exccause`

Your point about using rst_info.exccause for the divide by zero exception is well taken. The intention here was to ensure consistency across the crash reports, especially for tools like EspCrashSave and EspCrashSaveSpiffs that rely on accurate exception information.

I think the only issue is exc code would be different between rst info struct kept in memory between reboots and the one passed to the callback

As for my personal opinion, I can accept that the consequent exception code wouldn't be the correct one (6), that is due to ESP8266's hardware design. The more important factor is that during the crash instance, the code is correctly reported (as is the current case when considering print to serial), and the code is also correctly saved, so that further in the line it can also be correctly reported over the network, or something else.

However, you raise a valid concern about the discrepancy between the rst_info struct retained between reboots and the one passed to the callback. This could potentially lead to confusion in diagnosing issues.

Currently, the exception is correctly only displayed when a crash happens. We don't care about the next reboot, because we have lost the context of the actual crash. This PR aims to also correctly save the exception code. This means that later, when we access the crash information, we will also see the correct code. This can either be another print to the uart (triggered by user), or in the case I'm interested, transmit it over the network on the next reboot. Which means the code that is available in next reboots will be ignored in favor of the saved one, received from the crash callback.

I'm open to suggestions on how to handle this more effectively. Sadly, it appears that getting the same exception code across both instances or providing additional context in the callback to clarify the source of the exception code are both unlikely to implement.

Once again, I appreciate your insights and would love to hear any further thoughts you have on these topics. Your expertise in this area is invaluable, and I'm keen to ensure that the final implementation meets both the flexibility and reliability requirements of our users.

@DRSDavidSoft


 Merge branch 'esp8266:master' into pr-postmortem-extension

2a6b758

@DRSDavidSoft DRSDavidSoft mentioned this pull request

Aug 17, 2025

Feature Request: Don't print stack trace to Serial on crash #9194

Open

5 tasks

mcspr added a commit to mcspr/esp8266-Arduino that referenced this pull request

Aug 27, 2025

@mcspr


 Postmortem - export user reset reason & extra callbacks

6d5627c

ref. esp8266#9193, esp8266#9194, esp8266#9196
- persistent & public REASON_USER_... for user callback rst_info reason
- public postmortem callback functions, don't overly hide the implementation
- separate reasons for panic, assert, abort and c++ exceptions
- put postmortem func one level down in the function chain, allow to
 override both the function itself or individual internals
- stack offset for -fexceptions, skip unwind data to the actual point of origin

@mcspr mcspr mentioned this pull request

Aug 27, 2025

Postmortem - allow more control over crash dump #9271

Open

Labels

None yet

2 participants

@DRSDavidSoft @mcspr

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix #9194 - Allow more control over crash dump #9196

Are you sure you want to change the base?

Fix #9194 - Allow more control over crash dump #9196

Uh oh!

Conversation

@DRSDavidSoft DRSDavidSoft commented Aug 30, 2024 •

edited

Loading

Uh oh!

Description

Additional notes

Uh oh!

mcspr commented Aug 30, 2024

Uh oh!

DRSDavidSoft commented Aug 30, 2024

`custom_panic_callback`

`NO_CUT_HERE`

`rst_info.exccause`

Uh oh!

Uh oh!

Fix #9194 - Allow more control over crash dump #9196

Are you sure you want to change the base?

Fix #9194 - Allow more control over crash dump #9196

Uh oh!

Conversation

@DRSDavidSoft DRSDavidSoft commented Aug 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Additional notes

Uh oh!

mcspr commented Aug 30, 2024

Uh oh!

DRSDavidSoft commented Aug 30, 2024

custom_panic_callback

NO_CUT_HERE

rst_info.exccause

Uh oh!

Uh oh!

@DRSDavidSoft DRSDavidSoft commented Aug 30, 2024 •

edited

Loading

`custom_panic_callback`

`NO_CUT_HERE`

`rst_info.exccause`