I’m having trouble getting the correct timings from the OpenCL profiling functions. I’m using the CL_DEVICE_PROFILING_TIMER_RESOLUTION property combined with the clGetEventProfilingInfo function to try and get timings in nanoseconds, but following the information presented in the OpenCL spec seems to give incorrect results, unless I’m doing something wrong.
The spec states that:
The CL_DEVICE_PROFILING_TIMER_RESOLUTION specifies the resolution of the timer i.e. the number of nanoseconds elapsed before the timer is incremented.
i.e. One tick on the timer is equal to CL_DEVICE_PROFILING_TIMER_RESOLUTION nanoseconds.
With that in mind, I’m using the following (cut down) code:
// Get timer resolution cl_ulong resolution; clGetDeviceInfo(device, CL_DEVICE_PROFILING_TIMER_RESOLUTION, sizeof(cl_ulong), &resolution, NULL); // Get start & end timer values cl_ulong start, end; clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START sizeof(cl_ulong), &start, NULL); clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &end, NULL); // Convert to nanos, then to seconds long nanos = (end - start) * resolution; printf("Time taken = %.4lf seconds ", timeTaken * 1e-9);
The timings reported by this code are off by a factor of 1,000 on NVIDIA devices, and off by a factor of 1,000,000 on AMD CPUs. That would suggest that first of all NVIDIA and AMD have interpreted the RESOLUTION value differently, but also that either my code above is wrong, or both vendors are wrong.
Incidentally, ignoring the resolution and assuming the timer is in nanoseconds gives the right results.
Can anyone shed some light on what I’m doing wrong, if anything?