On what basis the kernel_work_group_size varies from kernel to kernel on same board.


I have written one opencl kernel using plain c, for that kernel it was supporting kernel_work_group_size of 128, i just modified the same code, instead of loops, i used vector operations inside the code, but the kernel_wrok_group_size it supports got reduced to 64, i didn’t understood the reason why it got reduced, i studied that it will depend on the kernel, but can any one tell on what factors of kernel it will depend, so that i can optimize my code in such a way that it can support maximum kernel_work_group_size.


This will be very implementation dependent, so it could be anything. But I know that some architectures will reduce the maximum allowed work group size based on the number of registers used in the kernel itself. Given that you moved from loops to vectors, it seems likely that your register pressure likely increased. So this could very well be the scenario you are in.