Let’s say I have several work groups with size 349. According to what I read, it’s better to use a work group size of 2^n instead of for example those 349. Can I expect a considerable gain of performance by using this kind of approach introducing padding?
Theorically yes, because the size tends to be in proportion with the warp, more preciselly, the walf-warp. I recommend you to read the OpenCL Programming Guide, and the others related to learn about warps, coalesing and banks of memory.