Help with VAR.

[First I want to say that I indeed did look at some tutorials and also had a look on that nvidia demo which really confused me, so… Still I’m sorry to bother you]
Well, I am trying to use VAR in my application and I must do something really wrong! I got 1 fps at best and a crash when it wasn’t quite so good.
My idea was to post some code and ask you to help me with ‘translating’ it to working ‘VAR-Code’, from there I maybe can manage to add fences in some good way. I know that sounds really lame and it probably is, but if you have some time please help.

const int Sprites = 131072;
const int FChunkSize = 128;
static int FTexCoords[FChunkSize 24];
GLfloat FVertices[FChunkSize 34];

void InitVA()
//I never change my TexCoords so I only compute them once…
TexCoordStep = 0;
for (int Count = 0; Count < FChunkSize; ++Count)
FTexCoords[TexCoordStep + 0] = 0;
FTexCoords[TexCoordStep + 1] = 0;

  FTexCoords[TexCoordStep + 2] = 0;
  FTexCoords[TexCoordStep + 3] = -1;

  FTexCoords[TexCoordStep + 4] = 1;
  FTexCoords[TexCoordStep + 5] = -1;

  FTexCoords[TexCoordStep + 6] = 1;
  FTexCoords[TexCoordStep + 7] = 0;

  TexCoordStep += 8;


//This is where the drawing happens
//This chunk-approach with only rendering 128 at a time
//turned out to be the fastest method of some others I tried…
//PWPos holds the worldcoordinates of some particles (*Sprites)
void RENDER::PathFVAC()
glVertexPointer(3, GL_FLOAT, 0, FVertices);
glTexCoordPointer(2, GL_INT, 0, FTexCoords);

int SpriteCount = 0;
while (SpriteCount < Sprites)
int ChunkCount = 0;
int VertexStep = 0;
while ((ChunkCount < FChunkSize) && (ChunkCount + SpriteCount < Sprites))
int ActCount = SpriteCount + ChunkCount;
cVector TempVec = Camera.RightVec * size[ActCount] * 0.5;
FUR = PWPos[ActCount] + TempVec + Camera.UpVec * size[ActCount];
FUL = PWPos[ActCount] - TempVec + Camera.UpVec * size[ActCount];
FLR = PWPos[ActCount] + TempVec;
FLL = PWPos[ActCount] - TempVec;

  	FVertices[VertexStep + 0] = FUL.X;
  	FVertices[VertexStep + 1] = FUL.Y;
  	FVertices[VertexStep + 2] = FUL.Z;

  	FVertices[VertexStep + 3] = FLL.X; 
  	FVertices[VertexStep + 4] = FLL.Y;
  	FVertices[VertexStep + 5] = FLL.Z;

  	FVertices[VertexStep + 6] = FLR.X;
  	FVertices[VertexStep + 7] = FLR.Y;
  	FVertices[VertexStep + 8] = FLR.Z;

  	FVertices[VertexStep + 9]  = FUR.X;
  	FVertices[VertexStep + 10] = FUR.Y;
  	FVertices[VertexStep + 11] = FUR.Z;;
  	VertexStep   += 12;
  glDrawArrays(GL_QUADS, 0, ChunkCount*4);
  SpriteCount += ChunkCount;



Thanks again, I am really stuck.
I did not include my VAR-code here as it is disastrous anyway.

I can´t see any error. Anyway, you should not post this part, because it is of minor importance. Post the code, where you set up your VAR, etc. That´s the code that matters.
(Even if it is disasterous.)


The only error in this part is the absence of VAR.
I realize that I was a fool to ask for the right code straight away, sorry.
Here comes my VAR-code:

GLfloat *VARArray;

//In some Init function:
VARArray = (GLfloat*)wglAllocateMemoryNV(FChunkSize * 3 * 4 * sizeof(GLfloat), 0.2f, 0.2f, 0.5f);
glVertexArrayRangeNV(FChunkSize * 3 * 4 * sizeof(GLfloat), VARArray);

//In a DeInit function

And then just substitute FVertices with VARArray. VARArray is doing all the Vertex stuff now. You notice that the TexCoords remain the old (and I know that is bad) but I don’t know how to allocate one chunk of memory and use it for a Vertex and a TexCoord-Array.
The above code is working but it is way to slow. If you could help me to bring it on a reasonable level, deleting the obvious misstakes and so on, that would be really great for me and appreciated!

My vertices change very often but that should be no problem as the nvidia demo shows us. I guess fences would work very well for me if I knew how to use them.


[This message has been edited by B_old (edited 12-28-2002).]

If you are interested, I have a small demo I made that renders two cubes using either VAR or VAO (ATI equivalent). One of the cubes is with interleaved arrays and the other without. The source is at:


Your problem is most likely from not having all the data in the vertex array. The driver then has to read back all the data and compose it into some other memory area. Reading back VAR is slow.

If your application is a typical game that loads geometry on start-up of a level, and then blows away all of it and re-loads later, then one way to deal with VertexArrayRange is to:

  • allocate a large chunk (large enough for all your geometry)
  • set aside some space for dynamic geometry (say a meg or two, unless you have a butt-load of it in which case you need more)
  • allocate storage space out of the rest by just incrementing a “top” pointer, and put static data into that space

When you’re at the end of the level, just set the “top” pointer back to the bottom. Here’s a mock-up of the implementation:

class AGPMemory {

// Create one of these for your GL context
AGPMemory( size_t size ) : size_(size) {
base_ = (char *)wglAllocateMemory( size, 0, 0, 0.7f );
malloced_ = false;
if( !base_ ) {
base_ = (char *)malloc( size );
malloced_ = true;
glVertexArrayRange( size, base_ );
glEnableClientState( GL_VERTEX_ARRAY_RANGE );
dynamicTop_ = 0;
staticTop_ = 0;

~AGPMemory() {
glDisableClientState( GL_VERTEX_ARRAY_RANG );
if( malloced_ ) {
free( base_ );
else {
wglFreeMemory( base_ );

// call this at the start of each level
void reset( size_t neededDynamic ) {
assert( neededDynamic < size_ );
dynamicTop_ = base_+neededDynamic;
staticTop_ = dynamicTop_;

// call this to allocate static geometry space (sort of like malloc)
void * allocStatic( size_t need ) {
assert( need+staticTop_ <= base_+size_ );
if( need+staticTop_ > base_+size_ ) {
return 0;
void * ret = staticTop_;
staticTop_ += need;
return ret;


char * base_;
char * dynamicTop_;
char * staticTop_;
bool malloced_;
size_t size_;

Obvious improvements could be made to pad each allocation out to the size of a line fetch buffer line (64 or 128 bytes), to align the base pointers to the same size, and to actually manage (or even return :slight_smile: the dynamic memory; say using double-buffering and fencing. But this should be a decent start, once you get it to compile (I just typed it into the browser to show the concept).

Jwatte you are right that I don’t have all of the data in the vertex array. This is due to the fact that I usually don’t know how much data I will have any given frame.
(In this example we do know it but in my actual code particles may come and go as they please).
But I cannot really follow you (sorry) when you say the data has to be read back. I am feeding vertices to a small array which I pass to the GPU. Then I process new vertices and feed them to the same array which I pass to the GPU and ever so on untill all quads are rendered. Next frame.
Do you not think it would be possible to have this small array in AGP/Video -memory?

Is your class the easiest way to do what I want? Then maybe I am not ready for this.

If I have allocated one big chunk of AGP-memory. How can I make ‘2’ arrays out of it?

I cannot find where you release the AGP-memory again . Where is it located?

Is it right that you use atiVertex also in the nvidia-path? I was somehow confused by that.

Thanks for all your help.

[This message has been edited by B_old (edited 12-28-2002).]

If you’re not familiar with how pointers and memory allocation work in general, then I suggest you study up on the C/C++ programming languages a little bit. Not knowing the basic fundaments of your programming language means that you’ll continually be frustrated trying to use a library expressed in that language.

If you pass half your data in the VAR and half of it outside, the driver will read the data back from the VAR to composite it with the data that’s not in the VAR. This is very slow. When VAR is enabled, all data must live within the VAR.

The class I posted shows an easy way of chopping up a large block into several smaller blocks.

If you re-write the same AGP memory, you have to either use a fence (NV_fence) or use glFlushVertexArrayRange() to make sure the GPU is finished with the data, before you overwrite the memory. Else weird triangles will end up being drawn where you write the data the GPU is currently rendering.

Even if you don’t know how much memory you will need, you will have to:

  1. limit the total number of particles to some maximum
  2. do some double-buffering of your dynamic AGP memory, so that the GPU can draw one piece while your CPU is busy preparing the next

I am not sure really if data is outside the VAR. The data is created immediately before it is drawn, so…
(I don’t doubt your knowledge at all, I was just not sure wether you understand my problem.)
Ah, you meant the texcoords that where not in the VAR-memory? Right, well help me get them inside.

Do you not think it would be enough for this special case to allocate memory for, say 2*128, textured quads. Then we divide it in to 2 fences for the buffering, and every fence has memory for 2 arrays. (TexCoords and Vertices)?
If this can be expressed without writing a memory-managing class I would be very interested.

Thanks for all the help!

[This message has been edited by B_old (edited 12-29-2002).]

struct Half {
float pos[128][3];
float tex[128][2];

struct Var {
Half a;
Half b;

Var * myVar = wglAllocateMemory( sizeof( Var ), 0, 0, 0.7 );

I still suggest you get a good book on C or C++ and figure out how pointers work, or you’ll just run into another roadblock a few days from now. A memory management class is nothing hard, nor magic; in fact, it’s usually the simplest way to use VAR.

Ah cool, I try that after I got myself some sleep, thanks.

If we ignore all C++ related problems from my side for now, do you think I can stick to my drawing approach and just optimize with VAR?

Thanks for all the help so far.

It works now. The speed is OK, but not the fastest I can get.
I have to do that Fence thing.
I anybody feels he can help me with this I would be grateful.

[This message has been edited by B_old (edited 12-30-2002).]

[This message has been edited by B_old (edited 12-30-2002).]