Skip to content

feat: refactor InfiniCore CPU runtime to InfiniRT#8

Merged
voltjia merged 1 commit into
masterfrom
feat/extract-infinicore-runtime
Jul 3, 2026
Merged

feat: refactor InfiniCore CPU runtime to InfiniRT#8
voltjia merged 1 commit into
masterfrom
feat/extract-infinicore-runtime

Conversation

@spike-zhu

@spike-zhu spike-zhu commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Extends scripts/generate_public_headers.py so generated public runtime dispatch only emits functions supported by the enabled backend runtime headers.
  • Adds CPU runtime support for host allocation, memory info, stream, event, and async API entries in src/native/cpu/runtime_.h.
  • Keeps the CPU runtime API aligned with the generated C++ InfiniRT runtime surface used by downstream InfiniCore work.

Motivation

This prepares InfiniRT CPU runtime coverage for replacing InfiniCore runtime calls with InfiniRT runtime APIs.

Related: InfiniTensor/InfiniCore#1342

Type of Change

  • feat - new feature / new operator / new platform
  • fix - bug fix
  • perf - performance improvement (no behavioral change)
  • refactor - code restructuring without behavior change
  • test - adding or fixing tests only
  • docs - documentation only
  • build / ci - build system or CI configuration
  • chore - tooling, formatting, or other non-code changes
  • Breaking change (requires a ! in the Conventional Commits prefix or a BREAKING CHANGE: footer)

Platforms Affected

  • CPU (WITH_CPU)
  • NVIDIA (WITH_NVIDIA)
  • Iluvatar (WITH_ILUVATAR)
  • MetaX (WITH_METAX)
  • Cambricon (WITH_CAMBRICON)
  • Moore (WITH_MOORE)
  • Ascend (WITH_ASCEND)
  • PyTorch C++ bindings (WITH_TORCH)
  • Build system / CMake / CI
  • Python bindings / user-facing API

Smoke Test Result

Not rerun while updating this PR description.

Current PR checks:
- ruff: passed
- clang-format: passed

Previous manual InfiniCore single-op validation evidence from the original PR description:
- https://github.com/user-attachments/assets/c5642073-3ce5-43c0-8002-11e0c44ac0bb
- https://github.com/user-attachments/assets/076a3af7-738a-43b3-a90d-80fc29221aee

Test Results on Supported Platforms

Platform Affected Build / Smoke Result Full Result / Notes
NVIDIA No N/A - not affected N/A
Iluvatar No N/A - not affected N/A
MetaX No N/A - not affected N/A
Cambricon No N/A - not affected N/A
Moore No N/A - not affected N/A
Ascend No N/A - not affected N/A
CPU Yes CI format checks passed; smoke not rerun in this update Related downstream validation is in InfiniCore#1342
Full `pytest` output (optional)
N/A

Benchmark / Performance Impact

N/A. This PR changes runtime API coverage and dispatch generation, not performance-sensitive kernels.

Notes for Reviewers

  • The CPU async memory/copy entries return an error where CPU has no asynchronous implementation.
  • MemGetInfo reports /proc/meminfo values on non-Windows platforms and returns invalid value if unavailable.
  • This PR is intended to be reviewed together with InfiniCore#1342.

@spike-zhu

spike-zhu commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

@voltjia 麻烦嘉成帮我看下修改后 InfiniCore 接入 InfiniRT cpu 运行时的整体思路是否正确,后续我会完善细节,感谢!

@spike-zhu spike-zhu requested a review from voltjia June 25, 2026 13:27
@voltjia

voltjia commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

@voltjia 麻烦嘉成帮我看下修改后 InfiniCore 接入 InfiniRT cpu 运行时的整体思路是否正确,后续我会完善细节,感谢!

我看了一下,基本上没啥问题,就是咱们这次重构有个原则:尽量复用 CUDA Runtime API 的接口,换句话说,有些接口需要查一下 CUDA Toolkit 里面有没有,比如我好像没查到 GetDeviceResourceSnapshot 相关的接口(也可能是我遗漏了),这部分 CUDA Toolkit 里面没有的,我们可以列个表出来,看看后面是不是真的需要迁移到新 InfiniRT 里面。除了接口名称,参数列表也得检查一下。别的目前看来没啥问题。

@spike-zhu

Copy link
Copy Markdown
Contributor Author

@voltjia 麻烦嘉成帮我看下修改后 InfiniCore 接入 InfiniRT cpu 运行时的整体思路是否正确,后续我会完善细节,感谢!

我看了一下,基本上没啥问题,就是咱们这次重构有个原则:尽量复用 CUDA Runtime API 的接口,换句话说,有些接口需要查一下 CUDA Toolkit 里面有没有,比如我好像没查到 GetDeviceResourceSnapshot 相关的接口(也可能是我遗漏了),这部分 CUDA Toolkit 里面没有的,我们可以列个表出来,看看后面是不是真的需要迁移到新 InfiniRT 里面。除了接口名称,参数列表也得检查一下。别的目前看来没啥问题。

ok,关于 CUDA Runtime API 接口我也调研罗列一下

@spike-zhu

spike-zhu commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

InfiniCore CPU Runtime API 迁移判定表

CUDA API 来源:cuda_runtime_api.h,CUDA Toolkit 12.9。

InfiniCore 中的 API 名称 InfiniRT 中的名称 对应的 CUDA API 名称 CUDA API 函数接口 InfiniRT 是否迁移
setDevice(int device_id) infini::rt::SetDevice(Device device) cudaSetDevice cudaError_t cudaSetDevice(int device); 已迁移
getDevice(...) / infinirtGetDevice(...) infini::rt::GetDevice(Device* device) cudaGetDevice cudaError_t cudaGetDevice(int *device); 已迁移
getDeviceCount(int *count) infini::rt::GetDeviceCount(int* count, Device::Type type) cudaGetDeviceCount cudaError_t cudaGetDeviceCount(int *count); 已迁移,InfiniRT 增加 Device::Type
deviceSynchronize() infini::rt::DeviceSynchronize() cudaDeviceSynchronize cudaError_t cudaDeviceSynchronize(void); 已迁移
mallocDevice(void **p_ptr, size_t size) infini::rt::Malloc(void** ptr, std::size_t size) cudaMalloc cudaError_t cudaMalloc(void **devPtr, size_t size); 已迁移
freeDevice(void *ptr) infini::rt::Free(void* ptr) cudaFree cudaError_t cudaFree(void *devPtr); 已迁移
memcpy(void *dst, const void *src, size_t size, infinirtMemcpyKind_t kind) infini::rt::Memcpy(void* dst, const void* src, std::size_t count, MemcpyKind kind) cudaMemcpy cudaError_t cudaMemcpy(void *dst, const void *src, size_t count, enum cudaMemcpyKind kind); 已迁移
memsetDevice(void *ptr, int value, size_t count) infini::rt::Memset(void* ptr, int value, std::size_t count) cudaMemset cudaError_t cudaMemset(void *devPtr, int value, size_t count); 已迁移
mallocHost(void **p_ptr, size_t size) infini::rt::MallocHost(void** ptr, std::size_t size) cudaMallocHost cudaError_t cudaMallocHost(void **ptr, size_t size); 已迁移
freeHost(void *ptr) infini::rt::FreeHost(void* ptr) cudaFreeHost cudaError_t cudaFreeHost(void *ptr); 已迁移
memcpyAsync(void *dst, const void *src, size_t size, infinirtMemcpyKind_t kind, infinirtStream_t stream) infini::rt::MemcpyAsync(void* dst, const void* src, std::size_t count, MemcpyKind kind, void* stream) cudaMemcpyAsync cudaError_t cudaMemcpyAsync(void *dst, const void *src, size_t count, enum cudaMemcpyKind kind, cudaStream_t stream); 已迁移
memsetDeviceAsync(void *ptr, int value, size_t count, infinirtStream_t stream) infini::rt::MemsetAsync(void* ptr, int value, std::size_t count, void* stream) cudaMemsetAsync cudaError_t cudaMemsetAsync(void *devPtr, int value, size_t count, cudaStream_t stream); 已迁移
mallocAsync(void **p_ptr, size_t size, infinirtStream_t stream) infini::rt::MallocAsync(void** ptr, std::size_t size, void* stream) cudaMallocAsync cudaError_t cudaMallocAsync(void **devPtr, size_t size, cudaStream_t hStream); 已迁移
freeAsync(void *ptr, infinirtStream_t stream) infini::rt::FreeAsync(void* ptr, void* stream) cudaFreeAsync cudaError_t cudaFreeAsync(void *devPtr, cudaStream_t hStream); 已迁移
streamCreate(infinirtStream_t *stream_ptr) infini::rt::StreamCreate(void** stream) cudaStreamCreate cudaError_t cudaStreamCreate(cudaStream_t *pStream); 已迁移
streamDestroy(infinirtStream_t stream) infini::rt::StreamDestroy(void* stream) cudaStreamDestroy cudaError_t cudaStreamDestroy(cudaStream_t stream); 已迁移
streamSynchronize(infinirtStream_t stream) infini::rt::StreamSynchronize(void* stream) cudaStreamSynchronize cudaError_t cudaStreamSynchronize(cudaStream_t stream); 已迁移
streamWaitEvent(infinirtStream_t stream, infinirtEvent_t event) infini::rt::StreamWaitEvent(void* stream, void* event) cudaStreamWaitEvent cudaError_t cudaStreamWaitEvent(cudaStream_t stream, cudaEvent_t event, unsigned int flags); 已迁移,但 InfiniRT 当前缺少 flags 参数
eventCreate(infinirtEvent_t *event_ptr) infini::rt::EventCreate(void** event) cudaEventCreate cudaError_t cudaEventCreate(cudaEvent_t *event); 已迁移
eventCreateWithFlags(infinirtEvent_t *event_ptr, uint32_t flags) infini::rt::EventCreateWithFlags(void** event, uint32_t flags) cudaEventCreateWithFlags cudaError_t cudaEventCreateWithFlags(cudaEvent_t *event, unsigned int flags); 已迁移
eventRecord(infinirtEvent_t event, infinirtStream_t stream) infini::rt::EventRecord(void* event, void* stream) cudaEventRecord cudaError_t cudaEventRecord(cudaEvent_t event, cudaStream_t stream); 已迁移
eventQuery(infinirtEvent_t event, infinirtEventStatus_t *status_ptr) infini::rt::EventQuery(void* event, int* status) cudaEventQuery cudaError_t cudaEventQuery(cudaEvent_t event); 已迁移,InfiniRT 用输出参数表达 complete/not-ready
eventSynchronize(infinirtEvent_t event) infini::rt::EventSynchronize(void* event) cudaEventSynchronize cudaError_t cudaEventSynchronize(cudaEvent_t event); 已迁移
eventDestroy(infinirtEvent_t event) infini::rt::EventDestroy(void* event) cudaEventDestroy cudaError_t cudaEventDestroy(cudaEvent_t event); 已迁移
eventElapsedTime(float *ms_ptr, infinirtEvent_t start, infinirtEvent_t end) infini::rt::EventElapsedTime(float* ms, void* start, void* end) cudaEventElapsedTime cudaError_t cudaEventElapsedTime(float *ms, cudaEvent_t start, cudaEvent_t end); 已迁移
getMemInfo(int device_id, size_t *free_bytes, size_t *total_bytes) infini::rt::GetMemInfo(Device device, std::size_t* free_bytes, std::size_t* total_bytes) cudaMemGetInfo cudaError_t cudaMemGetInfo(size_t *free, size_t *total); 已迁移,InfiniRT 增加 Device 参数
getDeviceResourceSnapshot(int device_id, infinirtDeviceResourceSnapshot_t *snapshot) 无直接对应 API CUDA Runtime 无 GetDeviceResourceSnapshot / resource snapshot 聚合接口 不迁移,保留在 InfiniCore 中
streamBeginCapture(infinirtStream_t stream, infinirtStreamCaptureMode_t mode) 当前 CPU 未迁移 cudaStreamBeginCapture cudaError_t cudaStreamBeginCapture(cudaStream_t stream, enum cudaStreamCaptureMode mode); CPU 当前不迁移,保留 unsupported
streamEndCapture(infinirtStream_t stream, infinirtGraph_t *graph_ptr) 当前 CPU 未迁移 cudaStreamEndCapture cudaError_t cudaStreamEndCapture(cudaStream_t stream, cudaGraph_t *pGraph); CPU 当前不迁移,保留 unsupported
graphDestroy(infinirtGraph_t graph) 当前 CPU 未迁移 cudaGraphDestroy cudaError_t cudaGraphDestroy(cudaGraph_t graph); CPU 当前不迁移,保留 unsupported
graphInstantiate(...) 当前 CPU 未迁移 cudaGraphInstantiate cudaError_t cudaGraphInstantiate(cudaGraphExec_t *pGraphExec, cudaGraph_t graph, unsigned long long flags); CPU 当前不迁移,保留 unsupported;InfiniCore 旧接口参数与新 CUDA 原型不完全一致
graphExecDestroy(infinirtGraphExec_t graph_exec) 当前 CPU 未迁移 无完全同名 CUDA Runtime API 常见对应为 cudaGraphExecDestroy(cudaGraphExec_t graphExec),需按 CUDA Toolkit 版本确认 CPU 当前不迁移,保留 unsupported
graphLuanch(infinirtGraphExec_t graph_exec, infinirtStream_t stream) 当前 CPU 未迁移 cudaGraphLaunch cudaError_t cudaGraphLaunch(cudaGraphExec_t graphExec, cudaStream_t stream); CPU 当前不迁移,保留 unsupported

@spike-zhu spike-zhu force-pushed the feat/extract-infinicore-runtime branch from 866fc8d to 7fd37b5 Compare June 30, 2026 01:49
@spike-zhu spike-zhu marked this pull request as ready for review June 30, 2026 01:51
Comment thread src/runtime.h Outdated
Comment thread src/runtime.h Outdated
Comment thread src/runtime.h Outdated
Comment thread src/runtime.h Outdated
Comment thread src/runtime.h Outdated
Comment thread src/runtime.h Outdated
Comment thread src/runtime.h Outdated
Comment thread src/native/cpu/runtime_.h
@voltjia

This comment was marked as outdated.

@spike-zhu spike-zhu force-pushed the feat/extract-infinicore-runtime branch 2 times, most recently from 1596dd5 to 187c34a Compare July 1, 2026 08:48
@spike-zhu spike-zhu requested a review from voltjia July 1, 2026 08:49
@voltjia voltjia force-pushed the feat/extract-infinicore-runtime branch from 37c4913 to a2aea14 Compare July 3, 2026 02:43
@voltjia voltjia changed the title feat: refactor InfiniCore cpu runtime to InfiniRT feat: refactor InfiniCore CPU runtime to InfiniRT Jul 3, 2026
@voltjia voltjia merged commit 568efd5 into master Jul 3, 2026
4 checks passed
@voltjia voltjia deleted the feat/extract-infinicore-runtime branch July 3, 2026 03:15
voltjia added a commit that referenced this pull request Jul 3, 2026
* feat!: align runtime API and add runtime dispatch (#11)

* Align runtime API with generated wrappers

* Add default runtime dispatch specialization

* Refactor runtime dispatch namespace

* Use Abseil status for runtime device API

* Revert "Use Abseil status for runtime device API"

This reverts commit a26ddff.

* Address runtime dispatch review feedback

* Keep runtime API list in generator

* Add TensorView constructor guard test

* Align runtime memcpy kind constants with CUDA API

* Use CUDA-style runtime memcpy constants

* Use CUDA-style runtime memcpy constants

* Move TensorView tests back into core test

* Remove standalone TensorView test target

* Remove standalone TensorView test file

* Use fully qualified runtime API names in README

* style: format runtime dispatch test

* feat: refactor InfiniCore CPU runtime to InfiniRT (#8)

Co-authored-by: Jiacheng Huang <huangjiacheng0709@outlook.com>

* feat: add platform-adaptive runtime tests (#15)

* feat: add runtime backend API foundation (#14)

---------

Co-authored-by: spike-zhu <74974704+spike-zhu@users.noreply.github.com>
voltjia added a commit that referenced this pull request Jul 3, 2026
* feat!: align runtime API and add runtime dispatch (#11)

* Align runtime API with generated wrappers

* Add default runtime dispatch specialization

* Refactor runtime dispatch namespace

* Use Abseil status for runtime device API

* Revert "Use Abseil status for runtime device API"

This reverts commit a26ddff.

* Address runtime dispatch review feedback

* Keep runtime API list in generator

* Add TensorView constructor guard test

* Align runtime memcpy kind constants with CUDA API

* Use CUDA-style runtime memcpy constants

* Use CUDA-style runtime memcpy constants

* Move TensorView tests back into core test

* Remove standalone TensorView test target

* Remove standalone TensorView test file

* Use fully qualified runtime API names in README

* style: format runtime dispatch test

* feat: refactor InfiniCore CPU runtime to InfiniRT (#8)

Co-authored-by: Jiacheng Huang <huangjiacheng0709@outlook.com>

* feat: add platform-adaptive runtime tests (#15)

* feat: add runtime backend API foundation (#14)

---------

Co-authored-by: spike-zhu <74974704+spike-zhu@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants