-
Notifications
You must be signed in to change notification settings - Fork 34
Add status and public/secret scores to /user/submissions list #502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1257,7 +1257,11 @@ def get_user_submissions( | |
| offset: Offset for pagination | ||
|
|
||
| Returns: | ||
| List of submission dictionaries with summary info and runs | ||
| List of submission dictionaries with summary info and runs. Each | ||
| entry includes ``status`` ("pending"/"failed"/"done") and | ||
| ``secret_score`` (the secret leaderboard geomean score, the ranking | ||
| metric; ``None`` if absent). The public leaderboard score remains | ||
| available per-run in ``runs[].score``. | ||
| """ | ||
| # Validate and clamp inputs | ||
| limit = max(1, min(limit, 100)) | ||
|
|
@@ -1325,16 +1329,59 @@ def get_user_submissions( | |
| "score": run_row[2], | ||
| }) | ||
|
|
||
| # Per-submission status + secret score. The `runs` above already | ||
| # carry the public leaderboard score (in runs[].score), but they are | ||
| # ranking-filtered (anti-cheat: only public runs whose matching | ||
| # secret run passed) and never include secret runs, so two things | ||
| # are not derivable from them: | ||
| # - secret_score: the secret leaderboard run's score (the actual | ||
| # ranking metric). Visible to the owner, as the detail endpoint | ||
| # already exposes it; the list endpoint just never selected it. | ||
| # - whether any run failed, so a finished-but-failed submission can | ||
| # be told apart from a clean one (both otherwise look "done"). | ||
| # One extra aggregate over the same runs rows (keyed by | ||
| # submission_id, like runs_query) avoids an N+1 detail fetch per row. | ||
| # | ||
| # MIN(score): a submission can have a secret leaderboard run per GPU; | ||
| # take the best (lowest) to match how the public score is summarized. | ||
| agg_query = """ | ||
| SELECT submission_id, | ||
| MIN(score) FILTER ( | ||
| WHERE mode = 'leaderboard' AND secret AND passed | ||
| ) AS secret_score, | ||
| bool_or(NOT passed) AS has_failed_run | ||
| FROM leaderboard.runs | ||
| WHERE submission_id = ANY(%s) | ||
| GROUP BY submission_id | ||
|
Comment on lines
+1348
to
+1355
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did you benchmark how long this sequel query will take?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. On the docker-compose test Postgres, seeded with 100 submissions × 6 runs (600 rows), the added aggregate query runs at ~0.25 ms/call for a full 100-submission page, versus ~21 ms for the whole |
||
| """ | ||
| self.cursor.execute(agg_query, (submission_ids,)) | ||
| agg_by_submission: dict = { | ||
| row[0]: {"secret_score": row[1], "has_failed_run": row[2]} | ||
| for row in self.cursor.fetchall() | ||
| } | ||
|
|
||
| # Build result with runs grouped by submission | ||
| results = [] | ||
| for row in submissions: | ||
| sub_id = row[0] | ||
| done = row[4] | ||
| agg = agg_by_submission.get(sub_id, {}) | ||
|
|
||
| if not done: | ||
| status = "pending" | ||
| elif agg.get("has_failed_run"): | ||
| status = "failed" | ||
| else: | ||
| status = "done" | ||
|
|
||
| results.append({ | ||
| "id": sub_id, | ||
| "leaderboard_name": row[1], | ||
| "file_name": row[2], | ||
| "submission_time": row[3], | ||
| "done": row[4], | ||
| "done": done, | ||
| "status": status, | ||
| "secret_score": agg.get("secret_score"), | ||
| "runs": runs_by_submission.get(sub_id, []), | ||
| }) | ||
| return results | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this here. Can you explain this to me? Why
MIN(score)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A submission can have a secret
leaderboardrun per GPU type, so there can be more than one secret score.MINtakes the best (lowest = fastest) one, matching how the existing public score is summarized for the row (therunsare likewise per-GPU and the caller/CLI takes the min). For single-GPU leaderboards like qr_v2 there's exactly one, so MIN is just that value. Happy to switch to per-GPU secret scores instead if you'd prefer symmetry withruns, but a single ranking number seemed more useful for the list view.