VAR performance

You’re right it’s not by frame but by second…my mistake
but still 80000tri/frame…
What’s boosting the performance like that ?
vertex arrays ?


Evil-Dog
Let’s have a funny day

Displaylists can do that as well, shouldn’t be a major problem.

Just don’t assume that because the cube spins at 100fps you can’t spin the same cube subdivided into 80000 triangles at 30fps.

Originally posted by jwatte:
[b]Write combining is the lowest priority functionality of the LFBs. If a fetch is necessary for anything (including promoting data from L2 to L1) that has higher priority than write combining.

I’ve heard someone say that he’d seen a CPU in simulation choose to evict a partially full LFB (write combiner) rather than choose one that’s available and empty. I almost believe him.

Anyway, even an L1 miss (L2 hit) is likely to evict your write combiners. Make sure you only operate on an L1-sized working set size at a time; thus batching your operations/updates and using cache pre-warming to make sure you don’t blow away your combining fetching into L1 while processing.[/b]

Write combining is what precisely and what is its relation with cache warming & memcpy?

Can you explain the eviction process when a cache miss occurs? Miss means what? while memcpy is executing, or a task switch?

PS: I dont know much about this stuff and it appears to be important to VAR, so that’s why Im asking.

V-man

Using VAR, I’m able to get these performances on a Duron800+GeForce 2GTS:
22.272 fps 907698 triangles ~ 20.22 MT
Var are allocated/copied only once, all my geometry feets into the allocated memory.

But I have never been able to get such performances using DisplayLists. DispLists would rather give me <5fps.

I suppose the difference comes from the fact that I am using many va or displist, whereas using a single Displist for the whole scene would improve performance. But that would be too restrictive for my apps.

Correction: sorry, fog was on =)
Without fog, var give me:
25.316 fps * 907698 triangles ~ 22.98 MT

[This message has been edited by tfpsly (edited 05-16-2002).]

tfpsly, are you using only vertex array, or normal, texcoord, etc. arrays too? And what fps do you get with normal VA?

This was done using 1 light on a 3ds model that has no texture. I repeat the model (the Capitol) 22 times to get a big amount of faces.
With one texture it might be a bit slower, but not that much (I’m not fill-rate limited).

So: vertex+normal arrays (stored in agp memory, only the indexes are sent to the card).

Using normal VA (no VAR), I get only:
5.115 fps 907698 polys triangles ~ 4.64 MT

I do not use tri-striping (I could, but stripping is just too slow to be computed on this mesh).

V-man, intel has some articles covering Caches, Write Combining & AGP-Memory in a series “Maximum FPS” about that. Check out
http://cedar.intel.com/cgi-bin/ids.dll/topic.jsp?catCode=CLM

The trick to render detailed environments is to not aim for Quake-style frame rates. 30 fps is quite playable in most games, and you can push 100,000 tris/frame at 30 fps using even low-end cards (like an MX 200) if you’re careful about fill rate and vertex formats. We use a combination of VAR and display lists.

If you want the insane benchmark style numbers of tris/second, you need to drop your frame rate fairly low, though (although that situation IS getting better!)