Cache coherency?

Can anyone explain to me exactly what ‘cache coherency’ is in relation to GPUs? I keep hearing phrases like “texture cache coherency” and “vertex cache coherency” from nVidia & ATI talks, but I can’t seem to find any documentation that actually explains these terms?!

Near the end of the vertex pipeline, there’s a cache that stores the results of the last 16 - 32ish vertices transformed & lit by the pipeline. If the same vertex is submitted twice in that 16-32 history window, the cached results can be reused.

This cache only gets used when using locked indexed vertex arrays or indexed VBOs.

Thus optimizing your mesh for cache coherency means reorganizing your triangles so that vertices that are rendered more than once fall within 16-32 indices of each other in your array.

IIRC, GF3 has 16 entries, GF4 has 24 entries, and IIRC there was some other gard with around 32ish… any one please jump in and correct me here??!

I see… so would texture cache coherency mean the same thing, keeping the last X amount of texels to be fetched in a texture cache?

Can you remember where you got those numbers by any chance? I’d love to read up more on this sort of nitty-gritty stuff, but seeing as it’s so close to the proprietary hardware, the information isn’t that easy to come by.

Thanks

WRT Texture cache: yes, and then some. My understanding is that, like a CPU’s cache, data near the data you use is also cached. This is particularly interesting for NVIDIA versus ATI 3D textures. Last I heard, ATI stores the data in linear (naive) order in graphics memory; NVIDIA breaks the texture into bricks so a small subvolume is contiguous in memory. Then, no matter how you slicee the volume, you have more-or-less consistant cache behavior, rather than being ideal when sampling in x-y, and terrible when in y-z. For this reason, ATI cards tend to produce varrying frame rates when used for volume rendering.

Can you remember where you got those numbers by any chance? I’d love to read up more on this sort of nitty-gritty stuff, but seeing as it’s so close to the proprietary hardware, the information isn’t that easy to come by.
Some of them I got from Nvidia presentations available on their website. The numbers are there, you just have to dig around in the older presentations/papers - around GF3/4 time. I can’t remember where I saw the cache size for FX cards though.

There are multiple caches in the pipeline, one mentioned is the vertex cache for indexed primitive vertex transformation result caching, there are also texture caches to reduce latency and bandwidth of memory fetches for texture operations. Different architectures have different cache implementations, there can also be cache for framebuffer contents of various types. A cache is a generic mechanism to reduce latency and bandwidth to memory so it is no surprise that GPUs use them to improve the pipeline efficiency.

Right. Seems obvious when I think about it! Thanks guys.