Nvidia GPU - OpenCL Profiling


I am using OpenCL on an Nvidia GTX-970 (Linux-Ubuntu).

I want to profile my OpenCL kernels for metrics like cache miss rates, SIMD Utilization, branch divergence etc. I have looked up online, but couldn not find anything for this case. AMD has its own APP Profiler from which one can get these stats, but i could not find something similar for an OpenCL kernel running on an Nvidia GPU.

Any ideas? I also asked this question on Nvidia’s developer forums but didnt get any reply.