Add BENCHMARK_RANK_TRUNCATION: compile-method-independent fixed-limit truncation detector#275
Draft
nikhilbarhate99 wants to merge 1 commit into
Draft
Conversation
…rary fixed-limit truncation detector
Catches input-structure classification + truncation regardless of compile method (closes the
load_inline evasion of BENCHMARK_UNSAFE_ALGO_DISPATCH) and ARBITRARY fixed-limit truncation via
value-independent fingerprints (output tail-zeroing + limit<n + loop-bounded-by-limit), not just
the qr_v2 ranks {256,384,768}. Exempts legitimate adaptive rank-revealing QR (runtime data-derived
bound). Validated zero-FP on clean full-QR + adaptive-RRQR; flips known load_inline truncators to
auto_filter (suki/tenzin/porco/cholopt).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Two existing fixture-specialization detectors can be evaded, letting submissions that classify the input's numerical structure and then truncate the computation (factor fewer columns/rows than the full problem) pass as clean:
BENCHMARK_UNSAFE_ALGO_DISPATCHrequiresraw_native_cache, which needs a literalnvccbuild and explicitly excludesload_inline. A submission compiled viatorch.utils.cpp_extension.load_inlinehas nonvccstring, so this rule can't fire even when it classify-then-truncates.INPUT_STRUCTURE_TRUNCATION_DISPATCHrequiressubset_set >= 2(gather/scatter), so a submission that classifies and truncates the whole batch uniformly (no per-element gather) stays under the threshold.Observed in the wild on the
qr_v2(linalg QR) leaderboard: submissions probe rank structure (classify_512/1024) and truncate the QR (_cqr_blocked_limit(384),tau[:, limit:]=0, looprange(0, limit)withlimit<n), skipping columns that are ~zero for the benchmark inputs — and pass KernelGuard.Fix
New detector
BENCHMARK_RANK_TRUNCATION(auto_filter,fixture_specializationfamily) that is:nvcc/load_inlinedistinction.X[...k:] = 0),limit < n, and a loop bounded bylimit(range(0, limit, ...)) — plus classifier / data-family / per-matrix-limit / hardcoded-rank signals. It is not restricted to the qr_v2 ranks {256,384,768}.matrix_rank,(diag>tol).sum(),count_nonzero); the value-independent fixed-limit rule is suppressed when an adaptive-rank computation is present.Validation
load_inlinetruncators tohacked/should_filter=True.Marking as draft for discussion — happy to adjust thresholds or move it to telemetry-first + enrichment per the precision-first policy in the blog.