ci(macrobenchmark): Run startup benchmark on Sauce Labs (POC)#5693
Draft
runningcode wants to merge 3 commits into
Draft
ci(macrobenchmark): Run startup benchmark on Sauce Labs (POC)#5693runningcode wants to merge 3 commits into
runningcode wants to merge 3 commits into
Conversation
The sentry-uitest-android-macrobenchmark module currently only runs on a locally connected device. This wires it to Sauce Labs so we can evaluate whether the cold-start timeToInitialDisplay benchmark can run on the real-device cloud already used by our other benchmarks. This is a proof-of-concept, gated behind workflow_dispatch and kept off the per-PR path. Device-guard errors are intentionally not suppressed: on non-rooted, unlocked-clock cloud devices the guards (UNLOCKED, DEBUGGABLE, ...) are expected to fire, and seeing which ones fire is the point of the POC. It also tells us whether timeToInitialDisplay can be retrieved from Sauce artifacts (benchmark JSON) or must be parsed from the device log. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Temporary scaffolding so the Sauce run executes on this branch without first landing on the default branch (workflow_dispatch is not dispatchable until the workflow exists on the default branch). Remove before merge. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drop the assembleMacrobenchmark Makefile target and invoke the two gradle assemble tasks directly from the workflow. Only one workflow uses them, so the extra indirection isn't worth it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
📲 Install BuildsAndroid
|
Contributor
Performance metrics 🚀
|
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| abfcc92 | 304.04 ms | 370.33 ms | 66.29 ms |
| 4e3e79d | 365.83 ms | 477.62 ms | 111.79 ms |
| d8912da | 329.94 ms | 389.68 ms | 59.74 ms |
| d15471f | 294.13 ms | 399.49 ms | 105.36 ms |
| abf451a | 332.82 ms | 403.67 ms | 70.85 ms |
| 604a261 | 380.65 ms | 451.27 ms | 70.62 ms |
| 806307f | 357.85 ms | 424.64 ms | 66.79 ms |
| 22f4345 | 307.87 ms | 354.51 ms | 46.64 ms |
| d217708 | 375.27 ms | 415.68 ms | 40.41 ms |
| c3ee041 | 310.64 ms | 361.90 ms | 51.26 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| abfcc92 | 1.58 MiB | 2.13 MiB | 557.31 KiB |
| 4e3e79d | 0 B | 0 B | 0 B |
| d8912da | 0 B | 0 B | 0 B |
| d15471f | 1.58 MiB | 2.13 MiB | 559.54 KiB |
| abf451a | 1.58 MiB | 2.20 MiB | 635.29 KiB |
| 604a261 | 1.58 MiB | 2.10 MiB | 533.42 KiB |
| 806307f | 1.58 MiB | 2.10 MiB | 533.42 KiB |
| 22f4345 | 1.58 MiB | 2.29 MiB | 719.83 KiB |
| d217708 | 1.58 MiB | 2.10 MiB | 532.97 KiB |
| c3ee041 | 0 B | 0 B | 0 B |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Proof-of-concept wiring to run the
sentry-uitest-android-macrobenchmarkcold-start benchmark on Sauce Labs real devices, instead of only on a locally connected device..sauce/sentry-uitest-android-macrobenchmark.yml— anespressoconfig with two APKs:app= thesentry-samples-androidrelease APK (the cold-start target),testApp= the self-instrumenting Macrobenchmark APK. Single high-end device (Pixel 9 Pro XL, api 35). No test orchestrator and noclearPackageData— Macrobenchmark drives its own process restarts/AOT compilation, andStartupMode.COLDintentionally keeps app data/permissions.Makefile—make assembleMacrobenchmarkbuilds both APKs..github/workflows/integration-tests-macrobenchmark.yml— aworkflow_dispatch-only job (deliberately not per-PR) that assembles, runs the Sauce config, and uploads whatever artifacts come back.Why this is a POC, not a replacement
This does not replace the existing "Performance metrics 🚀" comment (the
app-metricsjob inintegration-tests-benchmarks.yml). That comment measures SDK overhead —test-app-plainvstest-app-sentry, startup and binary size, with tuned thresholds. The Macrobenchmark measures the absolutetimeToInitialDisplayof a single already-instrumented app, has no plain baseline and no size metric, and only resolves changes of ~tens of ms. They answer different questions.Two things this POC is meant to learn
UNLOCKED,DEBUGGABLE, ...) are expected to fire and may fail the run. Seeing which fire tells us exactly what a real integration would have to accept or suppress (viaandroidx.benchmark.suppressErrors).*-benchmarkData.jsonfrom the additional-test-output dir. The config also grabs the device*.logas a fallback, since Macrobenchmark always logstimeToInitialDisplaythere.How to run
Trigger the Integration Tests - Macrobenchmark (POC) workflow via Run workflow (needs
SAUCE_USERNAME/SAUCE_ACCESS_KEY), then inspect themacrobenchmark-artifactsupload.🤖 Generated with Claude Code