Skip to content

AIR CLI Integration: Implement the air get command#5600

Open
riddhibhagwat-db wants to merge 7 commits into
air-integration-m0from
air-integration-m1-1
Open

AIR CLI Integration: Implement the air get command#5600
riddhibhagwat-db wants to merge 7 commits into
air-integration-m0from
air-integration-m1-1

Conversation

@riddhibhagwat-db

@riddhibhagwat-db riddhibhagwat-db commented Jun 14, 2026

Copy link
Copy Markdown

Changes

Implements databricks experimental ai get RUN_ID, the Go port of the Python air get command. It fetches the run via Jobs.GetRun and renders:

  • Core fields: run ID, status, submitted time, duration, retries, experiment, accelerators, creator (User), and the run's dashboard URL.
  • An MLflow deep-link, built from jobs/runs/get-output (the gen_ai_compute_output field is not modeled by the typed SDK, so it's fetched via a direct REST call).
  • For foreach/sweep runs, an iteration summary (counts + per iteration table) instead of the single-run view.
  • The run's training-config YAML, downloaded from the workspace and printed before the status (text mode only).

Why

get is the first real command integrated from the air cli and it sets the conventions the rest of the CLI will follow. The {v, ts, data} envelope mirrors the Python CLI so existing machine consumers keep working. The implementation is a faithful port of handle_status + the cli_display helpers, verified field-by-field against the Python source:

  • The text view shows the foreach branch (_display_foreach_sweep_status) and the training-config panel (_fetch_and_display_yaml_config); JSON output omits both, exactly matching air get <run> --json.
  • MLflow IDs live under an unmodeled gen_ai_compute_output field (direct REST call), and the MLflow link / YAML fetch are best-effort (logic matches python cli)

Tests

  • Unit tests cover every formatting/extraction helper, buildGetData, and all template branches (single-run minimal/all-fields, sweep, sweep-with-no-tasks).
  • Mock-backed unit tests (mirroring the Python unittest.mock suite) cover buildSweepInfo, printConfigYAML, mlflowURL (over httptest, since it bypasses the typed SDK), and the RunE invalid-id / not-found branches.
  • An acceptance test (acceptance/experimental/air/get) runs the command end-to-end against a stubbed Jobs API: text output, -o json, and an invalid run ID.

Implement the read-only run-details command (renamed from `status` to `get`).
It fetches a job run via the Jobs API and renders the run's status, start time,
duration, retries, experiment, accelerators, dashboard URL, MLflow deep-link,
and a foreach/sweep summary. Output is the air-style {v, ts, data} JSON envelope
under -o json, or a text view.

Renames the command-level identifiers (status -> get) while keeping the run's
"status" field/label. Adds format/mlflow/sweep/output helpers with unit tests
and an acceptance test, and drops `get` from the not-implemented stub coverage.

Co-authored-by: Isaac
@github-actions

github-actions Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Approval status: pending

/acceptance/experimental/air/ - needs approval

6 files changed
Eligible: @apeforest, @bfontain, @lu-wang-dl, @panchalhp-db, @vinchenzo-db, @maggiewang-db, @ben-hansen-db, @pardis-beikzadeh-db

/experimental/air/ - needs approval

11 files changed
Eligible: @apeforest, @bfontain, @lu-wang-dl, @panchalhp-db, @vinchenzo-db, @maggiewang-db, @ben-hansen-db, @pardis-beikzadeh-db

Any maintainer (@andrewnester, @anton-107, @denik, @pietern, @shreyas-goenka, @simonfaltum, @renaudhartert-db) can approve all areas.
See OWNERS for ownership rules.

@riddhibhagwat-db riddhibhagwat-db changed the title experimental/air: implement the air status command AIR CLI Integration: Implement the air get command Jun 14, 2026
@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

Integration test report

Commit: 8bfb38b

Run: 27649269043

Env 🟨​KNOWN 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
🟨​ aws linux 7 15 264 997 9:07
🟨​ aws windows 7 15 266 995 13:34
💚​ aws-ucws linux 7 15 360 911 6:48
💚​ aws-ucws windows 7 15 362 909 7:42
💚​ azure linux 1 17 267 995 5:53
💚​ azure windows 1 17 269 993 8:29
💚​ azure-ucws linux 1 17 365 907 7:12
💚​ azure-ucws windows 1 17 367 905 8:50
💚​ gcp linux 1 17 263 998 6:21
💚​ gcp windows 1 17 265 996 8:00
22 interesting tests: 15 SKIP, 7 KNOWN
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 🟨​K 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/grants/select 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
Top 25 slowest tests (at least 2 minutes):
duration env testname
4:18 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:14 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:11 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:59 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:46 aws linux TestSecretsPutSecretStringValue
3:37 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:25 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:25 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:24 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:21 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:21 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:19 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:18 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:17 azure windows TestAccept
3:09 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:00 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:55 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:52 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:46 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:39 gcp windows TestAccept
2:38 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:37 azure-ucws windows TestAccept
2:35 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:35 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:33 aws-ucws windows TestAccept

Comment thread experimental/air/cmd/get.go Outdated

cmdio.LogString(ctx, "Training Configuration:")
cmdio.LogString(ctx, string(content))
cmdio.LogString(ctx, "")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper function LogString writes to stderr, instead of stdout which was the original Python code behavior: https://github.com/databricks/cli/blob/main/libs/cmdio/log.go#L14-L18

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks for the catch!

Comment thread experimental/air/cmd/get.go Outdated

runID, err := strconv.ParseInt(args[0], 10, 64)
if err != nil || runID <= 0 {
return fmt.Errorf("invalid RUN_ID %q: must be a positive integer", args[0])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In json mode does this return a plain Go error instead of json envelope?

This and a few other places should return json if --json flag is passed

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this, thanks for the catch!

// Accelerators describes the run's GPUs, e.g. "8x H100".
Accelerators string `json:"-"`
// User is the run's creator. Text-only; JSON omits it, matching `air get --json`.
User string `json:"-"`

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is one area where we can update. I think user can be included in json

@@ -0,0 +1,36 @@

=== get (text)
>>> [CLI] experimental air get 123

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks different than the output from the wheel side right? can you add a before / after screenshot to the PR description for easy review?

it's ok if the match in format is "coming next" I just want to make sure I understand how big the diff is exactly.

@pardis-beikzadeh-db pardis-beikzadeh-db left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent review against the Python air source (handle_status + cli_display/json_output). The success JSON envelope shape, MLflow URL construction, sweep table, and YAML panel all port faithfully — nice work. A few divergences inline below (#1, the -o json error path, is the one I'd treat as the most important; the rest are retry/rounding correctness).

Two more are easiest to review visually — could you add before/after side-by-sides to the PR description showing the old air vs the new databricks experimental air get output for: (a) the text view of a run, and (b) -o json of a run, plus a not-found case? That lets us confirm at a glance:

  • text field ordering (Retries/Duration order and MLflow/User placement differ from the Python table at cli_display.py:249), and
  • the JSON started_at format — Python emits …+00:00 via .isoformat() (cli_entrypoint.py:1931), while the Go side emits …Z via RFC3339 (format.go:44), which is a value change for strict consumers.

Comment thread experimental/air/cmd/get.go Outdated
if err != nil {
// The backend returns this when the run ID is unknown to the user.
if errors.Is(err, apierr.ErrResourceDoesNotExist) {
return fmt.Errorf("run %d not found: check the run ID and that it is a job run ID", runID)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In -o json mode the Python CLI emits a structured error envelope and exits 1 — print_json_error("NOT_FOUND"/"INTERNAL_ERROR", kind, msg, retryable){v, ts, error:{...}} (cli_entrypoint.py:2017-2022). Here RunE returns a bare error regardless of output mode, so the framework prints a plain Error: … string. A consumer parsing the JSON error envelope from air get --json would break. Consider rendering the error envelope when output is JSON. (This JSON not-found branch is also currently untested.)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved, thanks!

Comment thread experimental/air/cmd/format.go Outdated
endMillis = time.Now().UnixMilli()
}

d := (endMillis - run.StartTime) / 1000

@pardis-beikzadeh-db pardis-beikzadeh-db Jun 16, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Python rounds to the nearest second: round((end - started_ms) / 1000) (cli_entrypoint.py:1934). Integer / 1000 truncates here, so e.g. an 11,500 ms run reports 11 vs Python's 12. Suggest rounding, e.g. (endMillis - run.StartTime + 500) / 1000.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this, thanks!

Comment thread experimental/air/cmd/format.go Outdated
Comment thread experimental/air/cmd/mlflow.go Outdated
return nil
}
// The MLflow output is attached to the task run, not the parent job run.
taskRunID := run.Tasks[0].RunId

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python resolves the task for runs/get-output as tasks[-1] (latest attempt, to handle retries; jobs_api_client.py:68). Using Tasks[0] here links a retried run to its stale first attempt's MLflow output. Suggest run.Tasks[len(run.Tasks)-1].RunId.

The training-config block is command result data, but it was emitted via
cmdio.LogString, which targets stderr. Write it to cmd.OutOrStdout() instead so
it lands on stdout, matching the Python `air get`. Download/read failures stay
on stderr as warnings.

Co-authored-by: Isaac
`air get` derived Submitted and Duration from run-level start/end and truncated
milliseconds to seconds. Port Python's _reported_attempt_timing so a retried run
reports its latest attempt, and round to the nearest second to match Python's
round(). Drops the run-level RunDuration shortcut, which diverged on retries.

Co-authored-by: Isaac
mlflowURL resolved runs/get-output against Tasks[0], linking a retried run to its
stale first attempt. Use the last task (latest attempt) to match Python
(jobs_api_client.py:68).

Co-authored-by: Isaac
…N with Python

In -o json mode, error paths now emit the structured error envelope
({v, ts, error:{code, kind, message, retryable}}) and exit non-zero, matching
the Python air CLI's print_json_error instead of letting the framework print a
bare "Error: ..." string. Covers invalid RUN_ID, run-not-found, backend
failures, and client/auth failures (wrapped PreRunE).

Also align the success envelope with the Python CLI:
- dashboard_url: construct {host}/jobs/runs/{id}?o={workspace_id} (via
  CurrentWorkspaceID) instead of using the API's run_page_url
- started_at: datetime.isoformat() form ("+00:00" with microseconds), not
  RFC3339 "Z"
- duration_seconds: rounded half-to-even to match Python's round()
- use run-level start/end times for started_at and duration_seconds, dropping
  the last-attempt preference, which had no Python equivalent

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants