the tool like this (for counting post-T&L cache size)can be easily made by anyone.
imagine a regular grid N cells by M cells (N is the number you want to find)
simply make such an indexed mesh call (quad strip, for example, for simplicity - with primitive_restart index, as it doesn’t invalidate cache), that numbers there are like
0,N, 1,N+1, 2,N+2, … , N-2,2N-2, N-1,2N-1, RESTART
such strips count is equal to M (M number doesn’t matterm but must be big enough to see the perfomance difference, 100 or more)
the very first indices line must prefill cache, so it must draw degenerated quadstrip with indices equals to
0,0, 1,1, 2,2, … , N-2,N-2, N-1,N-1, RESTART
Make 1 draw call with all these strips (about M+1 restarts must be there, in this index buffer).
What do we want to see? If N is less or equals to cache size, then every point will be computed ONLY ONCE. If N is more then cache size, there will be points, which will cause cache to invalidate immediately, and almost every point will be computed twice (maybe besides of very first degenerated row).
In a good first case, after degenerated strip call, cache will be filled by first N points (0 thru N-1).
Then, every strip call will take 1st vertex from cache(0, for example) and put there 1st vertex not from cache (N). This is also true for 1, 2 and so on. So, after 2nd call cache will be filled by second N points (N thru 2N-1).
So, varying N (from 16 and up) and looking for non-smooth perfomance drop-down, we would discover exact cache size. On nVidia GeForceFX and GeForce6 (on 7-series I didn’t do that, but I expect the same result).
Hard drop-down (about 10-15 percents) is seen, when we move from 24 to 25, so the answer is evident ))