Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

iOS device startup: upload .logarchive and pull crash reports on cleanup failure#5232

Open
matouskozak wants to merge 3 commits into
main from
lm-ios-diagnostic-extraction-pr
Open

iOS device startup: upload .logarchive and pull crash reports on cleanup failure #5232
matouskozak wants to merge 3 commits into
main from
lm-ios-diagnostic-extraction-pr

Conversation

@matouskozak

@matouskozak matouskozak commented May 28, 2026

Copy link
Copy Markdown
Member

Summary

When the post-iteration iOS app cleanup fails (devicectl returns 'No such process'), the app already exited before we could terminate it. This PR adds diagnostic data collection to help investigate why — crash, iOS-initiated termination, or natural exit.

Changes

src/scenarios/shared/runner.py:

  • Added make_archive to the shutil import
  • Wrapped killCmdCommand.run() in a try/except CalledProcessError block that:
    1. Zips and uploads the iOS device's .logarchive to HELIX_WORKITEM_UPLOAD_ROOT
    2. Copies device-side crash reports (.ips) via xcrun devicectl device copy from --domain-type systemCrashLogs
    3. Re-raises the original exception after diagnostics are saved

Why

  • Adds observability for iOS device startup/cleanup failures — a daily triage pain point
  • Investigative-only: triggers only on the cleanup failure that already terminates the iteration
  • No behavior change for successful runs

Origin

Cherry-picked from matouskozak/ios-sdk-jobs-macos-26 (commits 71bfd4a1, 3101d30c) which were left unmerged after PR #5231 was merged.

Verification

  • ✅ No conflict markers
  • python3 -m py_compile src/scenarios/shared/runner.py passes

...nup failure
When post-iteration app cleanup fails (devicectl 'No such process'), the app
already exited before we could terminate it. To diagnose why — crash, iOS-
initiated termination, or natural exit:
1. Zip and upload the iOS device's .logarchive to the Helix results container
2. Pull device-side crash reports (.ips) via xcrun devicectl
Investigative-only: triggers only on the cleanup failure that already
terminates the iteration.
Cherry-picked from matouskozak/ios-sdk-jobs-macos-26 (71bfd4a1, 3101d30c).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 28, 2026 15:11

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds diagnostic collection for iOS device startup cleanup failures, helping investigate cases where the app process is already gone when the runner tries to kill it.

Changes:

  • Wraps the iOS app kill step in CalledProcessError handling.
  • Uploads the collected .logarchive and attempts to copy device crash logs before re-raising.
  • Adds make_archive import for zipping the log archive.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/scenarios/shared/runner.py Outdated
Comment thread src/scenarios/shared/runner.py Outdated
- Wrap os.makedirs into the same try/except as the crash copy so a
 diagnostic-path failure cannot mask the original CalledProcessError
 from the failed app kill.
- Prune the systemCrashLogs copy after devicectl to entries relevant
 to this iteration: keep files matching the bundle name or with
 mtime >= the iteration's log-collect start. Drops dozens of unrelated
 .ips files (WiFiLQMMetrics, wifip2pd, old crashes, etc.) per upload.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

matouskozak commented Jun 8, 2026
edited
Loading

Copy link
Copy Markdown
Member Author

Copilot AI review requested due to automatic review settings June 8, 2026 16:45

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread src/scenarios/shared/runner.py Outdated
Comment on lines +771 to +792
# The Mac.iPhone.17.Perf devices are shared across many test runs and iOS retains
# crash reports in /var/mobile/Library/Logs/CrashReporter/ until rotated out, so
# systemCrashLogs accumulates days of stale .ips files from previous runs of the
# same app bundles. Wipe it now so the crash collection on a kill failure below
# only contains crashes generated during this test run. devicectl has no per-file
# delete; copying an empty directory in with --remove-existing-content replaces
# the destination contents.
getLogger().info("Clearing systemCrashLogs on device.")
try:
with tempfile.TemporaryDirectory() as emptyCrashDir:
crashClearCmd = [
'xcrun', 'devicectl', 'device', 'copy', 'to',
'--device', deviceUDID,
'--domain-type', 'systemCrashLogs',
'--source', emptyCrashDir,
'--destination', '/',
'--remove-existing-content', 'true',
]
RunCommand(crashClearCmd, verbose=True).run()
except Exception as ex:
getLogger().warning(f"Failed to clear systemCrashLogs (continuing): {ex}")

Comment on lines +872 to +876
# The kill is cleanup-only; the measurement data is already in the .logarchive above.
# devicectl returns non-zero when the app process is already gone (e.g. iOS terminated
# it, the app crashed, or it self-exited). Upload the .logarchive AND any device-side
# crash reports for this bundle to the Helix results container so we can diagnose
# why the app was already gone before re-raising.
Replace devicectl-based copy-everything-then-filter approach with
XHarness's CrashSnapshotReporter pattern, ported to Python:
* Before the iteration loop, run 'mlaunch --list-crash-reports' to
 capture the device's existing crash report set.
* In the kill-failure handler, re-list and download only the diff
 via 'mlaunch --download-crash-report --download-crash-report-to'.
* Poll the final snapshot for up to 60s so iOS has time to finish
 writing the crash report after the process dies (matches
 CrashSnapshotReporter.EndCaptureAsync).
The previous bundle/mtime filter still uploaded the historical
backlog of unrelated crashes that accumulate on shared
Mac.iPhone.17.Perf devices (8-20 stale .ips per work item seen in
build 2986965). The snapshot diff scopes uploads to only the
crashes generated during this test run.
It's a list+copy, not a move — device state is unchanged, matching
XHarness behaviour.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@matouskozak matouskozak force-pushed the lm-ios-diagnostic-extraction-pr branch from 0d442be to bad02e6 Compare June 8, 2026 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

Copilot code review Copilot Copilot left review comments

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

AltStyle によって変換されたページ (->オリジナル) /