My first few steps with OpenCL and I am facing this below problem.
The Kernel signature looks like
__kernel void knapsack(__global value_type *val,
__global weight_type *wgt,
__global value_type *soln,
__global int *i)
- value_type and weight_type are unsigned int
- val, wgt, soln and i are allocated using clCreateBuffer
- val, wgt (5 elements each) are of type CL_MEM_READ_ONLY and soln (126 elements) and i (single element) are CL_MEM_READ_WRITE
I set the arguments as required and launch the kernel. I enqueue a blocking read on ‘i’ to find the following
- if in the kernel, a write is performed on soln, the clEnqueueReadBuffer returns with a CL_OUT_OF_RESOURCES
- if no writes are performed (reads are OK, writes on ‘i’ are OK) on soln the clEnqueueReadBuffer returns with a CL_SUCCESS
I would like to know if I am doing something wrong.
I am using an Nvidia card. Please let me know if you need more information.
The problem you are seeing looks similar to this one. Could there be a bug in NVidia’s latest drivers?
I thought I would have another thorough look onto the code before I conclude the behaviour. It was because of a invalid memory access by a few threads.
I would consider this thread solved.
By invalid access do you mean invalid read or invalid write ?
It seems to me that invalid read in a kernel pose no problem to the kernel itself (terminates correctly, with full profiling info available and you can even retrieve the correct result from a cl_mem object NOT AFFECTED by the invalid read) but the next operation (read/write/use as argument) of the cl_mem object affected by the invalid read make the CL_OUT_OF_RESOURCES flag to propagate (hence the read/write/NDrange fails).
My advice (rule of thumb) for NVidia SDK:
when a kernel fails,
1- search out-of-bound writes in the kernel that crashed
2- search out-of-bound reads in the last kernel that last used the cl_mem objects used as argument by the crashed kernel.
i.e. the guilty kernel in case of invalid READ is generaly an ancestor of the crashed kernel (reading the return status of a blocking clEnqueueReadBuffer on the arguments of the crashed kernel may help)
an example (with comments added after #):
noyau «rremodul2» appelé sur 353×1 blocs de 32×32
durée du traitement par le noyau «rremodul2»=0.000112 s
mise en file t=30.403193, soumission t=30.403195, début t=30.403200, fin t=30.403312 s
noyau «preillumin» appelé sur 267 blocs de 1
durée du traitement par le noyau «preillumin»=0.000041 s
mise en file t=30.403351, soumission t=30.403354, début t=30.403359, fin t=30.403400 s # THIS KERNEL MADE AN INVALID READ, BUT ITS RETURN STATUS AND PROFILING INFO IS OKAY
noyau «preillumin» appelé sur 267 blocs de 1 # A SECOND GRID WITH THE SAME KERNEL (WITH THE SAME ARRAY READ OUT-OF-BOUND) clEnQueueNDRange returns CL_SUCCESS...
ERREUR (clWaitForEvents): Out of resources # but waiting for the second grid to complete (on the event_out event) returns CL_OUT_OF_RESSOURCES...
ERREUR: à l'exécution du noyau preillumin
ERREUR (timeCL:clGetEventInfo du status d'exécution de la commande): Out of resources
for AMDAPP on CPU, it seems that out-of-bound read (even masked ?) result in a core-dump in the faulty kernel (so the kernel which crashed is the guilty one).