multiple indices for different array (vertex, color,...)

system · November 14, 2004, 6:09pm

Originally posted by jwatte:
That’s not what I said. I said I measured the total size of vertex data + index data when storing meshes as normalized arrays, and when storing meshes as separate arrays with separate index streams.
Sorry about that.

On an unrelated note, let’s do some calculations :
assume we have texture coordinates (S, T), 32 bit float for a certain mesh, and it has 1000 of these.

You decide to use the multi index features, 16 bit unsigned.

1000 * 2 byte = 2000 bytes

You need to remove 2000 bytes/8 bytes = 250 tex coords from the previous 1000 to break even.

250/1000 = 25%

You need to remove at least 25% from any model you have to break (as far as tex coords are concerned)

For color, as the OP mentioned, assuming each color is 32 bit, you need to remove 50% to break even.

Korval · November 14, 2004, 8:13pm

Even if you’re using shadow volumes, Korval?
Yes.

Assuming modern hardware (something that can run HL2 or Doom3 with a good level of effects reasonably well), we can therefore assume 100M polys per second, theoretically. Now, we drop this to 50M right off the bat (turning theory into fact). At 75fps, that gives you 666 thousand polys per second. 10,000 poly characters means that you could have 66 of them, or 33 with one shadow, or 22 with 2. Plenty, as long as you refrain from multipassing too much.

I don’t see this at all. I see a single, unique, mapping (a k a “parameterization”). Then I see a large number of maps mapped over this unique parameterization. The maps can be different resolution, but I think they all share the same unique texture coordinate set.
Why? Why would you ever do that? Why would you limit your texture artists and modellers in this fashion?

It is, however, definitely NOT the smallest way to store a mesh in any modern art pipe I’ve worked with or looked at (and I’ve looked at several).
Have all of those art pipes not interfaced directly with some form of hardware that required single indexing? Certainly, I wouldn’t bother with such a representation if I knew I was just going to have to unpack it later.

As to the veracity of the claim that it is not the smallest way to store a mesh, I disagree.

Let’s say you have the following vertex format:

position
color
Normal
UV1
UV2 (a bump map)
Tangent for UV2
Binormal for UV2

Position and color are almost never going to cause a crease. Positions and colors are virtually always 1:1. The Normal will crease when the bump texture coordinates do, usually. The UV2, Tangent, and Binormal only crease simultaneously.

This leaves the following sets of creasing elements:

1: Position/Color (16-bytes)
2: Normal/UV2/Tangent/Binormal (44-bytes aprox)
3: UV1 (8-bytes)

The cost of a crease (using the same other indices except for this element) due to #2 is 16-bytes + 8 bytes. The cost of a crease due to #3 is 44-bytes + 16-bytes.

Let’s assume that, in the single index case, the total number of elements is 12,000. This makes the memory cost a total of 840,000 bytes.

Now, the multi-index case is difficult to compute. It is:

(12,000 * X) * 18-bytes +
(12,000 * Y) * 46-bytes +
(12,000 * Z) * 10-bytes

Where X, Y, Z are a scale of the single-index case to match the new index count. So, if the position/color was repeated 10% of the time, X would be 0.9.

To determine which one is smaller, we need to set the equations equal:

(X * 18) + (Y * 46) + (Z * 10) = 70.

Since we don’t have specific data for this mesh, we can’t really go any further. However, we do know the following. If Z isn’t smaller than 0.8 (8/10), then we know that Z isn’t going to make up the cost of its own indices. However, Y makes up the costs of its indices easily enough, at 44/46 or 0.95. That means that only 1 in 20 indices in the single index case have to duplicate set 2 in order for this to be a win (purely for set 2).

We can see what happens with some specific values. We’ll assume that X is the smallest, since a position defines when a crease happens and the color is almost always 1:1 with position, so color doesn’t induce creases.

If X=0.6, Y=0.9, and Z=1.0 (effectively, these numbers mean that UV1 dominates the creasing behavior), then we get 62.2, which is a win over 70.

If, however, Y dominates the creasing (more creases due to normals and normal-like things like bump maps. This may be more likely), then these seem reasonable: X=0.6, Y=1.0, Z=0.9. Neither Y nor Z makes up the cost of its indices. That still leaves us with 65.8.

The real question seems to be how low X is. If X can cover the cost of Y and Z’s indices, then you win with multiple indexing. X being small means that set 1 (position & color) is frequently repeated. If X is less than 14/18 or 0.77, then you always win. 0.77 means that 1 out of every 4 positions/colors in the single index case is a repetition.

The key seems to be either a lot of position-based creasing (a small X), or a modest amount of creasing of something large (a Y of relatively small size).

Reasonably, I would say that the creasing in X being 0.77 is not reasonable with small vertex formats (1 set of UVs, one color, one normal/binormal,tangent, etc). However, if you have many mesh parameterizations for several textures (diffuse/specular, bump, detail, bump-detail), the number of possible creases in position/color shoots up dramatically, and X decreases. 0.77 is not unreasonable for cases of multiple changing attributes.

Of course, since Jwatte doesn’t believe in having multiple parameters, he won’t see the need in this, but those who don’t force their texture and mesh artists to conform to such stringent requirements may find a memory reduction. Granted, the memory reduction is not necessarily dramatic, but as the number of parameterizations increase, it will become increasingly significant. Especially since increased parameterizations mean a basic increase in memory cost.

Also, consider this. It is easier to get a net gain if you have fewer sets indices. Granted, it is harder to make that gain significant, since having 2 sets of indexed data effectively means that you’ll have more replicated data due to creasing. This analysis for the proper format (# of indices) all sounds like something that should be programmed into a tool to optimize meshes for rendering.

More importantly, if hardware is already going to give us this in the future (D3D 10 requiring it will force the issue), then GL may as well support it. It’d be silly not to.

The problem with V-Man’s logic is that he only takes each individual attribute in turn, rather than looking at the whole set of attributes. The savings due to X and Z can total up to an overall savings, even if Y is 1.0 (never repeated).

Ysaneya · November 14, 2004, 11:28pm

Yes.

Assuming modern hardware (something that can run HL2 or Doom3 with a good level of effects reasonably well), we can therefore assume 100M polys per second, theoretically. Now, we drop this to 50M right off the bat (turning theory into fact). At 75fps, that gives you 666 thousand polys per second. 10,000 poly characters means that you could have 66 of them, or 33 with one shadow, or 22 with 2. Plenty, as long as you refrain from multipassing too much.

No offense, but this quote sounds very naive to me.

Your numbers are only valid in the “perfect case”, ie. an infinitely powerful CPU, and no shader or texture, because switching states will decrease your numbers by a lot, especially in a complex scene, even if you sort your meshes by material.

In addition shadow volumes requires multi-passing. They consume an enormous amount of CPU (to compute the silhouettes, and then to fill some dynamic vertex buffers), as well as some enormous amount of fillrate (even with 2-sided stencil). In summary, if you get 30 fps with 100k polys and only one light, you’re already lucky.

Y.

Korval · November 15, 2004, 7:55am

Your numbers are only valid in the “perfect case”, ie. an infinitely powerful CPU, and no shader or texture, because switching states will decrease your numbers by a lot, especially in a complex scene, even if you sort your meshes by material.
No, I accounted for that. That’s part of the cutting down from 100M to 50M. It accounts for time lost due to state changes, vertex programs running, and so forth.

In addition shadow volumes requires multi-passing. They consume an enormous amount of CPU (to compute the silhouettes, and then to fill some dynamic vertex buffers), as well as some enormous amount of fillrate (even with 2-sided stencil). In summary, if you get 30 fps with 100k polys and only one light, you’re already lucky.
Then stop using crappy rendering techniques.

Shadow maps don’t require any CPU computation of anything. No silhouette edges, nothing. You just render the mesh from the light’s point of view. They need fillrate, but far less than what stencil shadow volumes need.

In short: if you use a technique that is known for leaching the performance out of a GPU as a matter of course, you can’t blame the GPU for it.

system · November 15, 2004, 8:31am

Let’s assume that, in the single index case, the total number of elements is 12,000. This makes the memory cost a total of 840,000 bytes.

There are two ways to implement multi index :

one index array per attribute
grouping attributes, giving each an index array

Now for your calculations :
840,000 bytes?

Should be
(16 byte + 44 byte + 8 byte) *12000 vertices=
816000 bytes

the index array is most likely larger than 12000 because your models should have vertex sharing.
It’s a good idea to throw in a made up number, like 12000 * 1.12 = indices

Now, the multi-index case is difficult to compute. It is:

(12,000 * X) * 18-bytes +
(12,000 * Y) * 46-bytes +
(12,000 * Z) * 10-bytes
Here as well. Why is the index count tied to the vertex count?
Can you explain your reasoning?

knackered · November 15, 2004, 9:39am

Originally posted by Korval:
[b]Then stop using crappy rendering techniques.

Shadow maps don’t require any CPU computation of anything. No silhouette edges, nothing. You just render the mesh from the light’s point of view. They need fillrate, but far less than what stencil shadow volumes need.

In short: if you use a technique that is known for leaching the performance out of a GPU as a matter of course, you can’t blame the GPU for it. [/b]
huh???
Can you think of no advantages that shadow volumes have over shadow maps, korval?
Oooo, beautiful character models, but what the hell is that lego-land slab of darkness hanging off 'em?

Korval · November 15, 2004, 9:58am

Here as well. Why is the index count tied to the vertex count?
Can you explain your reasoning?
Hmmm… Not really. I must have gotten confused somewhere.

I had intended the 12,000 to be the number of indices, but I forgot that, even in the single index case, the number of indices is not the number of vertices.

Sorry about that.

Can you think of no advantages that shadow volumes have over shadow maps, korval?
I can think of some, but the massive performance disadvantages of shadow volumes seems to outweight the advantages. At least to me.

knackered · November 15, 2004, 11:15am

I agree.

Ysaneya · November 15, 2004, 1:35pm

Shadow maps don’t require any CPU computation of anything. No silhouette edges, nothing. You just render the mesh from the light’s point of view. They need fillrate, but far less than what stencil shadow volumes need.

Oups, sorry, you are obviously right. For some reason i (incorrectly) assumed that you were speaking of stencil shadows, not shadow maps, probably because i saw Doom 3 mentionned in your post.

Although honnestly 50M Tris/second, even with shadow maps and a recent graphics card, still looks quite high to me. 15 to 25 MTris maybe, but 50M ? Especially if you perform multisampling and implement per-pixel lighting; your framerate is gonna decrease pretty quickly…

Y.

system · November 15, 2004, 4:14pm

Originally posted by Ysaneya:
Although honnestly 50M Tris/second, even with shadow maps and a recent graphics card, still looks quite high to me. 15 to 25 MTris maybe, but 50M ? Especially if you perform multisampling and implement per-pixel lighting; your framerate is gonna decrease pretty quickly…

Y.[/QB]
Having more polys should be affordable.
Multisampling and fragment shaders are another part of the pipe. You have to balance the pipe.

The performance problems with both shadow methods is fillrate among others. Shadow maps add the issues that come with RTT.

idr · November 15, 2004, 8:04pm

IDR, you say ‘whenever this issue comes up’ - this suggests it comes up frequently - who brings it up at ARB meetings and what reasoning do they give?
To the best of my knowledge, and I’ve only been going to ARB meetings since September 2002, this has never come up at an ARB meeting. However, it comes up all the time on newsgroups and message boards around the net.