VAO probs :)

I’m using strips and i’m implementing VAO into my rasteriser.

Unfortunately, my vertice look like this: ;))

typedef struct SVR_VERTEX
VR_SHORT x; // $0
VR_SHORT y; // $2
VR_SHORT z; // $4
VR_SHORT rienz; // $6

VR_CHAR			nx;										//non transformed normal (-128,127)		$8
VR_CHAR			ny;										//										$9	
VR_CHAR			nz;										//										$0a
VR_CHAR			rien;									//										$0b

VR_BYTE			r;										//Fantastic colors..					$0c	
VR_BYTE			g;										//										$0d
VR_BYTE			b;										//										$0e
VR_BYTE			a;										//										$0f		

union {					
	VR_UV			texCoord[4];						//										$10
		VR_FLOAT	u0,v0;							//										$10,$14
		VR_FLOAT	u1,v1;							//										$18,$1c
		VR_FLOAT	u2,v2;							//										$20,$24
		VR_FLOAT	u3,v3;							//										$28,$3c



as u see, i’m only using floating point values for my texCoords… :slight_smile:
and note that primitives sizes can varry depending on the texture channels used… (minPrimSize $10, maxPrimSize $30)

Below the code needed to store prims onboard… (simply using GL_STATIC_ATI)

size = sizePrim * pInfos->nb;
pInfos->object = glNewObjectBufferATI(size, pList, GL_STATIC_ATI);

and the rendering stuff…
static VR_VOID displayVertexListGlATI(VR_LONG handle,VR_PRIM_TYPES type,VR_DWORD start,VR_DWORD nb)
VR_DWORD sizePrim,offset,maxChannels,channel;
VR_DWORD object;

pInfos = &pListInfos[handle];
pList = (VR_VERTEX*) pInfos->pList;

if (pList == NULL) return;

sizePrim = vrRasterPrimSize[pInfos->bits.uvChannels];
object = (VR_DWORD) pInfos->plistCache;

glArrayObjectATI(GL_VERTEX_ARRAY, 3, GL_SHORT, sizePrim , object, 0);

if (pInfos->bits.useColors){
	glArrayObjectATI(GL_COLOR_ARRAY_EXT, 4, GL_UNSIGNED_BYTE, sizePrim, object, 0x0c);
if (pInfos->bits.useNormals){
	glArrayObjectATI(GL_NORMAL_ARRAY_EXT, 3, GL_BYTE, sizePrim, object, 0x8);

if (pInfos->bits.useTexCoords){
	if (pInfos->bits.uvChannels > rasterInfos.caps.nbTextureUnits)
		maxChannels = rasterInfos.caps.nbTextureUnits;
		maxChannels = pInfos->bits.uvChannels;
	offset = 0x10;
	for (channel = 0;channel < maxChannels;channel++){


		glArrayObjectATI(GL_TEXTURE_COORD_ARRAY, 2, GL_FLOAT, sizePrim, object, offset);
		offset += 8;



if (pInfos->bits.useColors)

if (pInfos->bits.useNormals)

if (pInfos->bits.useTexCoords){
	for (channel = 0;channel < maxChannels;channel++){


Ok, nothing terrific! :slight_smile: i’m using glDrawArrays bcoz my prims are triangleStrips and i think that i’m using
correctly all the glArrayObjectATI stuff…
So what? well, nothing is displayed! what i’m suspecting is that VAO mechanism has got probs with my vertex data structure,
i mean when the structure is not using FLOAT values (the most common way around in sample code etc…) it doesn’t work…
Anyway, i have also modified the simpleVAO sample from ATI to use SHORT values for coords and it seems that the
same problem occurs. :)=

Anybody has noticed this previously? i’m using latest beta drivers from ATIDevrel…


ad!:=) cool ‘gallery’ to look at for coffee &| cigarette time ->

eh? did i miss anything?

For those interested if any…

I’ve played with a modified version of simpleVAO sample to make sure that it was not a silly bug from me :wink:

here are the results:
using GL_INT works (thus integers works! )
using vertex coords with GL_SHORT (no display, misinterpreted data??)
using vertex coords GL_BYTE (crash!)

Finally some feedback from ATI:
“I just ran the sample and found that with everything running normally I get nothing rendered, is this what you get? Further testing I found that our own SW ICD renders properly. Further still I found that
without TCL everything renders properly. This indicates to me that it is a driver bug and I’ve submitted the bug formally as EPR 64242 for you. I’ll let you know the results of the driver team investigation as I get updates.”

just have to wait for next beta drivers and keep fingers crossed! ;))

I’ll add that between my “alpha buffer doesn’t work with blending” and “tex env combine scale is bugged” list of driver bugs. I’m the first to admit ATI’s response is very good, but unfortunately, their drivers are still incredibly bugged, up to the point i cannot use any serious advanced feature in our application. There’s still a lot of work left, ATI !


possibly you should just get some other drivers… as far as i know, humus has no problems with the pixelpipeline stuff, means fragmentshaders and blending and rendertexture etc… and the scaling-bug is solved, too, i think…

vao is evolving i guess, so its not yet finished…

Evolving, yes that’s the word. How polite.
It’s been ‘evolving’ for quite a while now, hasn’t it?

i was not discussing about ATI drivers development roadmap guys! :wink:
only VAO is the subject eh? :slight_smile:

Yes, these bugs have been fixed in the beta drivers. These drivers are not public yet, so it makes no difference for the customer, heh ?

Back to the topic (sorry Ozzy). I never got VAO to work under Win98. In Win2k it works, but AFAIK they crash my machine when using dynamic arrays in special conditions. I have not debuggued yet, so it might be a code problem, not a driver one. Though i wouldn’t be that surprised…


For those interested here is my end of the story concerning the ATIVertexArrayObject bug and performance test VS VertexArrayRangeNV.…ember&year=2002

that’s it. :slight_smile:

What exactly are you saying ? Are you saying that VAO is slower than not using VAO or that VAR is simply faster ?
My own tests shows VAO ( on 8500, not LE ) to be at the least comparable to GeForce3’s VAR in terms of performance.

One thing I noticed in the past…using the 8500 in a lowend system ( P3 500mhz, AGP2x ) will show poor performance compared to the GeForce3. I was initially disappointed with my 8500 until I upgraded the rest of the system.

I’m saying that VAO is far slower than VAR but faster than CVA (hopefuly for an implementation which is supposed to store vertice onboard)
Now, maybe you’re right and CPU is too much involved in the VAO implementation.
Anyhow, who knows? but my question would be why the CPU should be so much involved when geometry is stored onboard and then it is supposed to become a GPU problem.

Btw, upgrading your system has generally a significant boost on performances using CVA bcoz they’re loaded from the bus to the board. but it isn’t the case when using static data onboard.

That’s it! and here are the perfs on my system.

[This message has been edited by Ozzy (edited 09-18-2002).]

I just want to add that these benchs show abnormals results when u think that VAO was written against the VAR from NV.
Now, considering the CPU overhead maybe the VAO stores only in AGP memory instead of VRAM but this is pure speculation. Only ATI could answer. Moreover, i would like to know why does it take so long to them to fix this silly bug in the VAO. :((

The whole issue is not being able to use GL_SHORT, GL_INT, GL_BYTE? Only float?

Whoaaa! deja vue baby!

I think that came up almost a year ago.

Anyways, why not use floats for vertex & tex? The card will probably convert to floats anyway.


Fore sure i know about this strange feeling of deja vu! just read the date of the first post :wink:

1 float = 4bytes, 1 short = 2 bytes, 1 byte = ?

now multiply by ??? what u want. which of them will take less memory?
this is a quite important consideration when u need to store static data. Moreover i admit the importance of floating point values for calculations but definitely not for storage.
anyhow, vertex arrays are supposed to handle different data types that’s all, that’s an unresolved bug for a long time now.

What happends is most likely that it goes into software mode and converts shorts and bytes into floats, I’d guess the hardware doesn’t support those formats natively.
I have noticed remarkable performance increases by going from normal vertex arrays to VAO, but then I’m only using floats.

Originally posted by Humus:
What happends is most likely that it goes into software mode and converts shorts and bytes into floats, I’d guess the hardware doesn’t support those formats natively.

Yep possibly, this should be performed while creating the VertexArrayObject then. Btw, Geforce boards support this kind of native formats which kept more VRAM for data and increase performances on primitives processing aswell. :)=

I have noticed remarkable performance increases by going from normal vertex arrays to VAO, but then I’m only using floats.[/b]

I agree 100% with that! VAO are around twice faster than CVA. Now still comparing with Nvidia CVA implementation with lighting enabled, the 8500LE obtains really poor results even compared to a tornadoMx200!. Of course, the Radeon8500 has got others interesting features with its pixel pipes etc… but i’m speaking about raw speed with single texturing (but lighting) this sort of things. :wink:
Anyhow, the fact that all the geometry passed to the VAO will reside in AGP mem instead of VRAM should be a good explanation/theory.

  1. AGP transft break // processes
  2. It is obviously done by the CPU.
  3. It has direct performance hit depending on the system.

Originally posted by Ozzy:
Yep possibly, this should be performed while creating the VertexArrayObject then.

The driver doesn’t know the format when you upload the data so it can’t possibly preprocess anything.

that’s right! :wink: Oops…
Thus, if data format doesn’t match that could be the bottleneck.
Then, are there any details concerning the optimal format to use? (Floating point values for each field?, size?, pad?)
which could prevent the software conversion.

from the specs then.

Implementation Notes

For maximum hardware performance, all vertex arrays except for
color and secondary color should always be specified to use float
as the component type. Color and secondary color arrays may be
specified to use either float or 4-component unsigned byte as the
component type.

So i will make a few more tests and change colors from bytes to float but i think it will not change anything… ;))

[This message has been edited by Ozzy (edited 09-19-2002).]

Ok after a few & last more testings it seems that using GL_FLOAT for colors is slower than GL_UNSIGNED_BYTE! :slight_smile:
Moreover, i’ve also done a few tests using concurrent primitives (same handle with multiple display) to prevent overhead due to primitives initialisation and the gain is about 4% for 61% of concurrent prims in the scene. Note that using the same test with the VAR implementation, gains are not relevant.

the end.