Skip to content

ci(macrobenchmark): Run startup benchmark on Sauce Labs (POC)#5693

Draft
runningcode wants to merge 3 commits into
mainfrom
no/macrobenchmark-on-sauce-poc
Draft

ci(macrobenchmark): Run startup benchmark on Sauce Labs (POC)#5693
runningcode wants to merge 3 commits into
mainfrom
no/macrobenchmark-on-sauce-poc

Conversation

@runningcode

Copy link
Copy Markdown
Contributor

What

Proof-of-concept wiring to run the sentry-uitest-android-macrobenchmark cold-start benchmark on Sauce Labs real devices, instead of only on a locally connected device.

  • .sauce/sentry-uitest-android-macrobenchmark.yml — an espresso config with two APKs: app = the sentry-samples-android release APK (the cold-start target), testApp = the self-instrumenting Macrobenchmark APK. Single high-end device (Pixel 9 Pro XL, api 35). No test orchestrator and no clearPackageData — Macrobenchmark drives its own process restarts/AOT compilation, and StartupMode.COLD intentionally keeps app data/permissions.
  • Makefilemake assembleMacrobenchmark builds both APKs.
  • .github/workflows/integration-tests-macrobenchmark.yml — a workflow_dispatch-only job (deliberately not per-PR) that assembles, runs the Sauce config, and uploads whatever artifacts come back.

Why this is a POC, not a replacement

This does not replace the existing "Performance metrics 🚀" comment (the app-metrics job in integration-tests-benchmarks.yml). That comment measures SDK overheadtest-app-plain vs test-app-sentry, startup and binary size, with tuned thresholds. The Macrobenchmark measures the absolute timeToInitialDisplay of a single already-instrumented app, has no plain baseline and no size metric, and only resolves changes of ~tens of ms. They answer different questions.

Two things this POC is meant to learn

  1. Device guards. Device-guard errors are intentionally not suppressed. On non-rooted, unlocked-clock cloud devices the guards (UNLOCKED, DEBUGGABLE, ...) are expected to fire and may fail the run. Seeing which fire tells us exactly what a real integration would have to accept or suppress (via androidx.benchmark.suppressErrors).
  2. Result retrieval. It's unclear whether Sauce downloads Macrobenchmark's *-benchmarkData.json from the additional-test-output dir. The config also grabs the device *.log as a fallback, since Macrobenchmark always logs timeToInitialDisplay there.

How to run

Trigger the Integration Tests - Macrobenchmark (POC) workflow via Run workflow (needs SAUCE_USERNAME/SAUCE_ACCESS_KEY), then inspect the macrobenchmark-artifacts upload.

🤖 Generated with Claude Code

runningcode and others added 3 commits July 2, 2026 19:25
The sentry-uitest-android-macrobenchmark module currently only runs on a
locally connected device. This wires it to Sauce Labs so we can evaluate
whether the cold-start timeToInitialDisplay benchmark can run on the
real-device cloud already used by our other benchmarks.

This is a proof-of-concept, gated behind workflow_dispatch and kept off the
per-PR path. Device-guard errors are intentionally not suppressed: on
non-rooted, unlocked-clock cloud devices the guards (UNLOCKED, DEBUGGABLE,
...) are expected to fire, and seeing which ones fire is the point of the
POC. It also tells us whether timeToInitialDisplay can be retrieved from
Sauce artifacts (benchmark JSON) or must be parsed from the device log.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Temporary scaffolding so the Sauce run executes on this branch without first
landing on the default branch (workflow_dispatch is not dispatchable until
the workflow exists on the default branch). Remove before merge.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drop the assembleMacrobenchmark Makefile target and invoke the two gradle
assemble tasks directly from the workflow. Only one workflow uses them, so
the extra indirection isn't worth it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@sentry

sentry Bot commented Jul 2, 2026

Copy link
Copy Markdown

📲 Install Builds

Android

🔗 App Name App ID Version Configuration
SDK Size io.sentry.tests.size 8.47.0 (1) release

⚙️ sentry-android Build Distribution Settings

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Performance metrics 🚀

  Plain With Sentry Diff
Startup time 323.74 ms 373.10 ms 49.36 ms
Size 0 B 0 B 0 B

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
abfcc92 304.04 ms 370.33 ms 66.29 ms
4e3e79d 365.83 ms 477.62 ms 111.79 ms
d8912da 329.94 ms 389.68 ms 59.74 ms
d15471f 294.13 ms 399.49 ms 105.36 ms
abf451a 332.82 ms 403.67 ms 70.85 ms
604a261 380.65 ms 451.27 ms 70.62 ms
806307f 357.85 ms 424.64 ms 66.79 ms
22f4345 307.87 ms 354.51 ms 46.64 ms
d217708 375.27 ms 415.68 ms 40.41 ms
c3ee041 310.64 ms 361.90 ms 51.26 ms

App size

Revision Plain With Sentry Diff
abfcc92 1.58 MiB 2.13 MiB 557.31 KiB
4e3e79d 0 B 0 B 0 B
d8912da 0 B 0 B 0 B
d15471f 1.58 MiB 2.13 MiB 559.54 KiB
abf451a 1.58 MiB 2.20 MiB 635.29 KiB
604a261 1.58 MiB 2.10 MiB 533.42 KiB
806307f 1.58 MiB 2.10 MiB 533.42 KiB
22f4345 1.58 MiB 2.29 MiB 719.83 KiB
d217708 1.58 MiB 2.10 MiB 532.97 KiB
c3ee041 0 B 0 B 0 B

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant