OpenCL Optimised SGEMM implementation

abherc2 · October 10, 2019, 4:05am

I am trying to implement SGEMM for integrated GPUs like ARM MALI Midgard GPUS or Intel GPUs.
The issue is The gpu versions of the implementations are quite slower than the cpu implementation. I have arrays in row-major form and I don’t wish to do any memory reshaping. What’s the best way to do this?

system · April 10, 2020, 4:05am

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.