High-performance computing on accelerators (GPUs, TPUs, FPGAs) demands extreme performance and correctness, yet traditional accelerator code relies heavily on C/C++ with manual memory management and ...