actually i am working on memory coalescing technique. and i have searched so much but hardly able to get code regarding this issue. Do you have code regarding this?
I think the best explanation on that will ne the lectures here:
For Memory Coalescing, have a look at
CUDA University Courses
University of Illinois : ECE 498AL
Taught by Professor Wen-mei W. Hwu and David Kirk, NVIDIA CUDA Scientist.
–> Memory Bank Conflicts (115 MB)
There are many nuances and details, but for simple kernels the key element is this: For adjacent work items, you want them accessing adjacent memory. Sometimes this means doing things in a counter-intuitive fashion. An example is using a 1D kernel to process 2D images (there are reasons why you’d want to do this) – you should run it on columns and not rows (i.e., interpret get_global_id(0) as X) because then for each iteration of the Y loop inside the kernel the work items will be accessing horizontally adjacent pixels.