win2K and slow opengl?!?

imported_jwatte · August 2, 2002, 9:17am

> I do use tri strips, but I average only
> about 5 tri per strip (still cuts verts
> down by 50%).

There is a fair bit of overhead per call issues to the driver (although the amount of overhead varies depending on whether you change modelview, enable states, etc).

Also, if you issue the same triangle strip as a triangle index list, the verts will cache as well as in the strip, and the number of transformed verts will be the same, so it really doesn’t cost measurably more.

Chances are that if you’re currently using 5 triangle strips, bunching all your strips into one big triangle list instead may be faster.

ToolChest · August 2, 2002, 10:10am

I’ve looked into this a few times and I’m not sure that it would help me much. I only looked into nvidia’s caching and I think it only cached a handful of verts (around 4 - 10?). My strips are not laid out to ensure that the verts would repeat frequently enough to help. I estimated that at best I would see a 1% – 2% repeat likely hood. Normally any increase is worth a quick change, but I’m looking at a major overhaul.

So indexing is out…

Thanks anyway…

John.

imported_jwatte · August 2, 2002, 12:20pm

john,

Think again. If you’re drawing a strip like this:

1 3 5 7

2 4 6 8

Then the stripification of that would look like:

1 2 3 4 5 6 7 8

The triangle list would look something like:

1 2 3 2 4 3 3 4 5 4 6 5 5 6 7 6 8 7

Note that the cache utilization here is 66%, which means that your actual vertex transform throughput will be exactly the same as for the strip. However, when you batch multiple strips in one triangle list, you will save on driver call overhead.

ToolChest · August 2, 2002, 12:38pm

I do see what you are saying; however the 66% utilization brings me back up to my current performance. My chances of hitting any of verts 1-8 are not very good right now, back to my 1%-2%. So, my performance would be 67%-68% where right now it is the equivalent to 66%. Besides, if the tris received a bad sort I wouldn’t be guaranteed the original 66%.

John.

imported_jwatte · August 2, 2002, 8:16pm

You’re saying you’re using several tristrips, each of which is very small.

I’m telling you to use a single, big triangle list.

Assuming the vertex cost is the same, making a single large buffer call is typically more efficient than many small buffer calls, unless “big” goes beyond some per-card limit, which is in the thousands for even the most restrictive card.

Now, each each tri strip needs its own modelview and texture state, then you might want to start thinking about pre-transform and texture sheeting to be able to pack it into a single triangle list to make it go faster.

AGP 4X is a gigabyte per second, give or take. PC133 can (just barely) be provoked to go that fast, in a single direction.

[This message has been edited by jwatte (edited 08-02-2002).]

system · August 3, 2002, 10:03am

Originally posted by jwatte:

Note that the cache utilization here is 66%, .

How did you calculate 66%, or is this a given number that you are quoting?

V-man

ToolChest · August 3, 2002, 1:52pm

jwatte,

I do agree with you that the call overhead stinks, but untill it becomes my major bottle-neck I cant justify a major overhaul.

John.

imported_jwatte · August 4, 2002, 7:45am

V-man,

If you look at the triangle list, each triangle is drawn with two old verts and one previously un-issued vert. Thus there’s a 66% cache hit rate except for the first triangle (assuming the vertex cache is at least 4 elements deep :-).

ToolChest · August 5, 2002, 5:01am

I know this is a little of the original topic, but:

If my current bottle neck is the memory bus and I’m going to use VARs to render parts of my level faster. I’m assuming that the CPU and the AGP bus will fight for use of the system memory. That means that if the VAR data is stored in AGP memory I will still be limited to the system memory speed and not the AGP bus speed. Is this correct or can the AGP bus and the CPU access the memory at the same time? This would effectively double the system memory speed if use properly. I’ve read a lot on Intel’s site about AGP, however a lot of this info is based on evenly distributing work between the CPU and GPU, which is not the problem right now.

Thanks…

John.

ToolChest · August 7, 2002, 10:38am

yes, more stupid questions…

I’ve been reading a lot of AGP info on the Intel site and the more I read the more confused I get. AGPx4 is said to have around a 1G peek transfer, but my memory is PC133 133M peek transfer (right?). So is the blazing speed something seen only by people with faster memory busses or can the AGP bus magically access the memory faster?

Also, if my system memory is a bottleneck then the AGP and CPU will still be fighting for the same 133M right?

John.

imported_jwatte · August 7, 2002, 12:02pm

PC133 memory has a peak throughput (unidirectional) of about 1 GB / second, so AGP 4x is well matched to that. It’s 8 bytes per clock (64 bits wide) at 133 MHz clock.

AGP memory access will indeed contend with the CPU for memory bus speed. However, AGP DMA is sufficiently loose in its specification that the north bridge can make much more efficient use of the bandwidth that is there, than if you were doing PCI DMA. Also, unless your algorithm is somehow degenerate, then your CPU will be slurping in a cache chunk, then chew on it for a while, then slurp in the next one. While it’s chewing, AGP can get at the memory “for free”.

If you’re not using AGP memory, but still using vertex arrays, then it’s likely that the driver will copy data out of your system memory buffer, and into its own AGP buffer, from which it will then issue the geometry. This is likely to suck.

If you really have a fully streaming bottleneck, and have done everything you can to batch your processing to avoid unnecessary bus turn arounds and partial cache line evictions, then the only thing you can do to increase performance much is to move your vertex data to VRAM. Note that it’ll compete with texture data on the card at that point.

ToolChest · August 7, 2002, 12:39pm

jwatte,

Thanks for the response. I almost have the code in place to start testing VARs with my app. After reading your post I think that I should see a significant performance boost just from AGP storage. I’ve been very concerned with starving my video card of video memory because my app switches texture more frequently than I would like. I’m hoping that the driver will continue to do a good job of caching the textures.

Thanks for the help…

John.