Why is half-precision complex float arithmetic not supported in Python and CUDA?
问题 NumPY has complex64 corresponding to two float32's. But it also has float16's but no complex32. How come? I have signal processing calculation involving FFT's where I think I'd be fine with complex32, but I don't see how to get there. In particular I was hoping for speedup on NVidia GPU with cupy. However it seems that float16 is slower on GPU rather than faster. Why is half-precision unsupported and/or overlooked? Also related is why we don't have complex integers, as this may also present