We are profiling an OpenCL application running on an NVidia GPU on both the host and the device. We were surprised to find that (based on gperftools) the host was spending 44% of its time in clGetPlatformInfo
, a method which is only called a single time in our own code. It is called by clEnqueueCopyBuffer_hid
, clEnqueueWriteBuffer_hid
, and clEnqueueNDRangeKernel_hid
(and presumably all the other clEnqueue
methods, but they are less commonly called in our code). Since this is taking so much of our host time, and we appear to be bound by the host speed right now, I need to know if there’s a way to eliminate these extra calls.
Why is this being called by every OpenCL call? (Presumably it’s static information that could be stored in the context?) Did we perhaps initialize our context incorrectly?
EDIT: I was asked for an MWE:
#include <CL/opencl.h>
#include <vector>
using namespace std;
int main ()
{
cl_uint numPlatforms;
clGetPlatformIDs (0, nullptr, &numPlatforms);
vector<cl_platform_id> platformIdArray (numPlatforms);
clGetPlatformIDs (numPlatforms, platformIdArray.data (), nullptr);
// Assume the NVidia GPU is the first platform
cl_platform_id platformId = platformIdArray[0];
cl_uint numDevices;
clGetDeviceIDs (platformId, CL_DEVICE_TYPE_GPU, 0, nullptr, &numDevices);
vector<cl_device_id> deviceArray (numDevices);
clGetDeviceIDs (platformId, CL_DEVICE_TYPE_GPU, numDevices, deviceArray.data (), nullptr);
// Assume the NVidia GPU is the first device
cl_device_id deviceId = deviceArray[0];
cl_context context = clCreateContext (
nullptr,
1,
&deviceId,
nullptr,
nullptr,
nullptr);
cl_command_queue commandQueue = clCreateCommandQueue (context, deviceId, {}, nullptr);
cl_mem mem = clCreateBuffer (context, CL_MEM_READ_WRITE, sizeof(cl_int),
nullptr, nullptr);
cl_int i = 0;
while (true)
{
clEnqueueWriteBuffer (
commandQueue,
mem,
CL_TRUE,
0,
sizeof (i),
&i,
0,
nullptr,
nullptr);
++i;
}
}
This MWE generates the following profile over the course of several seconds. Note that 99% of the time is spent in clGetPlatformInfo. (See stack overflow question 61663830 for the diagram, since I can’t post links (yet?))
This is a repost from the StackOverflow question 61663830, which is still awaiting an answer.