ci(macrobenchmark): Run startup benchmark on Sauce Labs (POC) by runningcode · Pull Request #5693 · getsentry/sentry-java

runningcode · 2026-07-02T17:26:05Z

What

Proof-of-concept wiring to run the sentry-uitest-android-macrobenchmark cold-start benchmark on Sauce Labs real devices, instead of only on a locally connected device.

.sauce/sentry-uitest-android-macrobenchmark.yml — an espresso config with two APKs: app = the sentry-samples-android release APK (the cold-start target), testApp = the self-instrumenting Macrobenchmark APK. Single high-end device (Pixel 9 Pro XL, api 35). No test orchestrator and no clearPackageData — Macrobenchmark drives its own process restarts/AOT compilation, and StartupMode.COLD intentionally keeps app data/permissions.
Makefile — make assembleMacrobenchmark builds both APKs.
.github/workflows/integration-tests-macrobenchmark.yml — a workflow_dispatch-only job (deliberately not per-PR) that assembles, runs the Sauce config, and uploads whatever artifacts come back.

Why this is a POC, not a replacement

This does not replace the existing "Performance metrics 🚀" comment (the app-metrics job in integration-tests-benchmarks.yml). That comment measures SDK overhead — test-app-plain vs test-app-sentry, startup and binary size, with tuned thresholds. The Macrobenchmark measures the absolute timeToInitialDisplay of a single already-instrumented app, has no plain baseline and no size metric, and only resolves changes of ~tens of ms. They answer different questions.

Two things this POC is meant to learn

Device guards. Device-guard errors are intentionally not suppressed. On non-rooted, unlocked-clock cloud devices the guards (UNLOCKED, DEBUGGABLE, ...) are expected to fire and may fail the run. Seeing which fire tells us exactly what a real integration would have to accept or suppress (via androidx.benchmark.suppressErrors).
Result retrieval. It's unclear whether Sauce downloads Macrobenchmark's *-benchmarkData.json from the additional-test-output dir. The config also grabs the device *.log as a fallback, since Macrobenchmark always logs timeToInitialDisplay there.

How to run

Trigger the Integration Tests - Macrobenchmark (POC) workflow via Run workflow (needs SAUCE_USERNAME/SAUCE_ACCESS_KEY), then inspect the macrobenchmark-artifacts upload.

🤖 Generated with Claude Code

The sentry-uitest-android-macrobenchmark module currently only runs on a locally connected device. This wires it to Sauce Labs so we can evaluate whether the cold-start timeToInitialDisplay benchmark can run on the real-device cloud already used by our other benchmarks. This is a proof-of-concept, gated behind workflow_dispatch and kept off the per-PR path. Device-guard errors are intentionally not suppressed: on non-rooted, unlocked-clock cloud devices the guards (UNLOCKED, DEBUGGABLE, ...) are expected to fire, and seeing which ones fire is the point of the POC. It also tells us whether timeToInitialDisplay can be retrieved from Sauce artifacts (benchmark JSON) or must be parsed from the device log. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Temporary scaffolding so the Sauce run executes on this branch without first landing on the default branch (workflow_dispatch is not dispatchable until the workflow exists on the default branch). Remove before merge. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Drop the assembleMacrobenchmark Makefile target and invoke the two gradle assemble tasks directly from the workflow. Only one workflow uses them, so the extra indirection isn't worth it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

sentry · 2026-07-02T17:32:07Z

📲 Install Builds

Android

🔗 App Name	App ID	Version	Configuration
SDK Size	io.sentry.tests.size	8.47.0 (1)	release

⚙️ sentry-android Build Distribution Settings

github-actions · 2026-07-02T17:42:03Z

Performance metrics 🚀

	Plain	With Sentry	Diff
Startup time	323.74 ms	373.10 ms	49.36 ms
Size	0 B	0 B	0 B

Baseline results on branch: main

Startup times

Revision	Plain	With Sentry	Diff
`abfcc92`	304.04 ms	370.33 ms	66.29 ms
`4e3e79d`	365.83 ms	477.62 ms	111.79 ms
`d8912da`	329.94 ms	389.68 ms	59.74 ms
`d15471f`	294.13 ms	399.49 ms	105.36 ms
`abf451a`	332.82 ms	403.67 ms	70.85 ms
`604a261`	380.65 ms	451.27 ms	70.62 ms
`806307f`	357.85 ms	424.64 ms	66.79 ms
`22f4345`	307.87 ms	354.51 ms	46.64 ms
`d217708`	375.27 ms	415.68 ms	40.41 ms
`c3ee041`	310.64 ms	361.90 ms	51.26 ms

App size

Revision	Plain	With Sentry	Diff
`abfcc92`	1.58 MiB	2.13 MiB	557.31 KiB
`4e3e79d`	0 B	0 B	0 B
`d8912da`	0 B	0 B	0 B
`d15471f`	1.58 MiB	2.13 MiB	559.54 KiB
`abf451a`	1.58 MiB	2.20 MiB	635.29 KiB
`604a261`	1.58 MiB	2.10 MiB	533.42 KiB
`806307f`	1.58 MiB	2.10 MiB	533.42 KiB
`22f4345`	1.58 MiB	2.29 MiB	719.83 KiB
`d217708`	1.58 MiB	2.10 MiB	532.97 KiB
`c3ee041`	0 B	0 B	0 B

runningcode and others added 3 commits July 2, 2026 19:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ci(macrobenchmark): Run startup benchmark on Sauce Labs (POC)#5693

ci(macrobenchmark): Run startup benchmark on Sauce Labs (POC)#5693
runningcode wants to merge 3 commits into
mainfrom
no/macrobenchmark-on-sauce-poc

runningcode commented Jul 2, 2026

Uh oh!

sentry Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jul 2, 2026

Baseline results on branch: main

Startup times

App size

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

runningcode commented Jul 2, 2026

What

Why this is a POC, not a replacement

Two things this POC is meant to learn

How to run

Uh oh!

sentry Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📲 Install Builds

Android

Uh oh!

github-actions Bot commented Jul 2, 2026

Performance metrics 🚀

Baseline results on branch: main

Startup times

App size

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sentry Bot commented Jul 2, 2026 •

edited

Loading