Cache miss in kernel


Should I consider the caches of a single core ?

The input data is 2 3D matrices each contains 16x256x16 elements.

When the core access the data is does it slowly.

So I guess I caused a lot of cache miss.

Where can I find information about the size of L1,L2 cache of a display card ?

I’m using NVIDIA’s GeForce 9400 GT: NVIDIA GeForce Graphics Cards

The spec does not contains this information.


Hi Zvika,

Geforce 9400 GT is compute capability 1.0 (see here:

Look at CUDA programming guide, Appendix G.3, for explanation on Compute Capability 1.x architecture, and how to access the memory (it’s a split warp architectures). CUDA C++ Programming Guide