Skip to content

eval: hand kernel a fresh input each timed iteration (close result-replay in scored path)#501

Draft
nikhilbarhate99 wants to merge 1 commit into
gpu-mode:mainfrom
nikhilbarhate99:fix/scored-path-replay-vector
Draft

eval: hand kernel a fresh input each timed iteration (close result-replay in scored path)#501
nikhilbarhate99 wants to merge 1 commit into
gpu-mode:mainfrom
nikhilbarhate99:fix/scored-path-replay-vector

Conversation

@nikhilbarhate99

Copy link
Copy Markdown

Problem

examples/eval.py::_run_single_benchmark runs the scored (benchmark / leaderboard) path with recheck=False: it calls generate_input() once, checks correctness once, then invokes custom_kernel(data) for all max_repeats timed iterations on the same data object (same id(), same .data_ptr()).

This leaves the dominant "result replay" hack open in the ranked path: a submission can cache the output on the first call (keyed on object identity / .data_ptr() / ._version) and return it with zero compute for every timed iteration, reporting an enormous false speedup. (Test mode already uses recheck=True, but the ranked timing does not.)

Fix

In the non-recheck timed loop, hand the kernel a fresh tensor clone each iteration (new object + new data_ptr()). The clone happens outside the timed region (before start_event.record()), so the measured kernel time is unaffected. The two most recent inputs are retained so the caching allocator can't immediately recycle a freed data_ptr.

This defeats identity / ._version / pointer-based result replay in the scored path at negligible (untimed) cost.

A stronger alternative (already supported by the harness) is to default recheck=True for ranked, which additionally re-generates with a fresh seed and re-checks correctness every iteration; the clone approach is the minimal, low-overhead fix that closes the replay vector specifically.

Marking as draft for discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant