Add Analysis design doc#446
Conversation
| ``` | ||
|
|
||
| The factory registers all combinations of scalar type (I4/I8/R4/R8), rank | ||
| (1–5), and memory location (Device/Host/Both) for each operator template: |
There was a problem hiding this comment.
For my understanding, why is it important to support both device and host locations? Is it that the actual operations would be conducted on the device but we would accept inputs that originate on the host and are copied to device within the operator?
There was a problem hiding this comment.
Actually, after thinking it over, I'm leaning towards not supporting HostArray types. Given that the compute work is currently done almost exclusively with ArrayDDTT types (ArrayMemLoc::Device arrays for GPU builds and ArrayMemLoc::Both for CPU-only builds), supporting HostArrayDDTT (ArrayMemLoc::Host) seems unnecessary right now, especially given deadlines.
Adding HostArray support to the analysis code would significantly increase code bloat and complexity. This would require either:
- memory transfers (as you mention). This would also complicate the operator templating, since output arrays could no longer be straightforwardly templated off the input arrays
- branching compute methods. Each operator compute method would require if-else branches, with duplicate code for kokkos kernels or ordinary loops depending on ArrayMemLoc.
If a compelling need comes up later, we can add this support then. The analysis module will continue to evolve after the immediate delivery, so I think we can avoid adding in this extra complexity prematurely.
There was a problem hiding this comment.
Sounds good. I support this simpler path.
Add Analysis module design document. Compiled here.
Describes the configurable operator framework for in-situ analysis computation, including:
Checklist