fix: critical bugs, architecture improvements, and test infrastructure by LessUp · Pull Request #3 · LessUp/sgemm-optimization

LessUp · 2026-07-01T10:45:25Z

Summary

CRITICAL: Fix getTheoreticalPeakGflops — Remove spurious * 1000 that made peak GFLOPS 1000x too large, corrupting all efficiency calculations and breaking performance test bounds
CRITICAL: Fix CPU test compilation — Rename test_device_info_cpu.cpp → .cu because it includes kernel headers with CUDA <<<>>> launch syntax unparseable by g++
CRITICAL: Fix duplicate main() — Remove redundant main() from CPU test files that conflicted with GTest::gtest_main
HIGH: Fix integer overflow in tensor_core_benchmark.cuh (M * K int multiplication)
HIGH: Fix division by zero in measureGpuTime when benchmark_runs <= 0
HIGH: Fix negative dimension handling in KernelConstraints::isSatisfied
MEDIUM: Fix NaN/Inf reference handling in verification comparison logic
MEDIUM: Fix exception safety in printSkipReason (diagnostic function should not throw)
MEDIUM: Fix file write error checking in exportRooflineData
MEDIUM: Extract shared overflow-check utilities (safeGridSize, checkMatrixElementCount) to eliminate DRY violation
MEDIUM: Adjust performance test bounds for corrected peak calculation
Infrastructure: Add CPU test execution to CI (ctest -L cpu)
Infrastructure: Add CMake test helpers with standardized target creation and CTest labels
Infrastructure: Refactor test files — migrate CPU tests to .cpp, split device_info_seam into CPU/CUDA test files

Test plan

CI format check passes
CI CUDA build passes (all targets compile)
CI CPU tests pass (ctest -L cpu)
GPU runtime tests pass on local CUDA machine (requires GPU)

Generated with Devin

## Critical Bug Fixes - **getTheoreticalPeakGflops**: Remove spurious `* 1000` that made peak GFLOPS 1000x too large, causing all efficiency calculations to report 0.1% instead of 100%. This also broke the PeakPerformanceReference test upper bound check. - **CPU test compilation**: Rename test_device_info_cpu.cpp to .cu because it includes tensor_core_sgemm.cuh which contains CUDA `<<<>>>` kernel launch syntax that g++ cannot parse. Add CUDA compile options to the sgemm_add_cpu_test CMake helper for .cu sources. - **Duplicate main()**: Remove redundant `main()` from CPU-only test files (test_benchmark_settings.cpp, test_device_info_cpu.cu) that conflicted with GTest::gtest_main linkage. ## High-Severity Fixes - **Integer overflow**: Fix `M * K` int multiplication overflow in tensor_core_benchmark.cuh by using `static_cast<size_t>` consistently for all matrix size calculations. - **Division by zero**: Add validation in measureGpuTime() to reject benchmark_runs <= 0, preventing undefined behavior. - **Negative dimensions**: Add M > 0 && K > 0 && N > 0 check in KernelConstraints::isSatisfied() to prevent undefined modulo behavior with non-positive dimensions. ## Medium-Severity Fixes - **NaN/Inf verification**: Handle NaN/Inf in reference values in compareMatricesImpl() instead of producing NaN comparison results. - **Exception safety in diagnostics**: Replace CUDA_CHECK (throwing) with non-throwing error handling in printSkipReason(). - **File write error checking**: Add file.fail() check after writing roofline export data. - **DRY violation**: Extract safeGridSize() and checkMatrixElementCount() to cuda_utils.cuh, eliminating duplicated overflow-check code between tensor_core_sgemm.cuh and tensor_core_benchmark.cuh. - **Performance test bounds**: Adjust PeakPerformanceReference bounds (100-200000 GFLOPS) to accommodate the corrected peak calculation and future GPU architectures. ## Infrastructure Improvements - **CI**: Add CPU test execution step (`ctest -L cpu`) to CI workflow. - **CMake helpers**: Add SgemmTestHelpers.cmake with standardized test target creation functions (sgemm_add_cpu_test, sgemm_add_cuda_test, sgemm_add_cuda_perf_test) with proper CTest labels. - **Test refactoring**: Migrate CPU-only tests from .cu to .cpp, split device_info_seam.cu into separate CPU and CUDA test files. - **Kernel catalog**: Add KernelConstraints struct with isSatisfied() method, refactor BenchmarkRunner to use catalog-driven dispatch. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

- Add missing `include(GoogleTest)` before SgemmTestHelpers.cmake to resolve `gtest_discover_tests` unknown command error - Apply clang-format to test_utils.cu, test_sgemm.cu, and test_device_info_cuda.cu to fix format check failures Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

The refactored verify.cuh removed the static shouldFlagAsIncorrect method from SGEMMVerifier, breaking test_sgemm.cu which calls SGEMMVerifier::shouldFlagAsIncorrect(result). Restore it as a deprecated compatibility method. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

The test uses tensorCoresAvailable() which is defined in tensor_core_sgemm.cuh, but the header was not included. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

The test expected 19481.0 GFLOPS but the correct calculation is 108 * 64 * 2 * 1.41 = 19491.84 GFLOPS. Fix the expected value and remove the stale * 1000 from the comment. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

LessUp and others added 5 commits July 1, 2026 18:44

LessUp merged commit 4b39e85 into master Jul 1, 2026
2 checks passed

LessUp deleted the fix/bugs-and-architecture-improvements branch July 1, 2026 11:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: critical bugs, architecture improvements, and test infrastructure#3

fix: critical bugs, architecture improvements, and test infrastructure#3
LessUp merged 5 commits into
masterfrom
fix/bugs-and-architecture-improvements

LessUp commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LessUp commented Jul 1, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant