diplaylists: glVertexPointer and such

i was reading a thread here, that gave me the impression that glVertexPointer might be invalid for display list compilation.

in the msvc docs i see it listed amongst excluded commands.

i don’t however see glDrawArrayElements. (i believe that is the correct command name)

could it possibly be that the VertexPointer state is compiled into the glDrawArrayElements ‘opcode’, because otherwise i’m not sure why DrawArrayElements would not be excluded likewise.

provided this is the case, and there is no way to do VBO type sequences in a display list… i’m quite disappointed, and would recommend this functionality be considered in the future.

i take it the issue here is non VBO usage, and agp VBOs. but non VBO usage is mutually exclusive, and display lists requiring VBOs in agp memory, could be executed by the driver for equivalent performance.

would compiling VBO sequences into video memory display lists not encourage applications to utilize smaller triangle batch sizes, so that more smaller indapendant geometries can be processed versus fewer larger geometries.

current hardware trends as explained to me favors a large boulder over 1000 pebbles. the pebbles would probably be more realistic, and this trend does not bode well for complex particle systems managed by the cpu and similar applications.

The main problems in display lists are static stuffs, whereas arrays are intended for dynamic stuffs. More, arrays are now often faster than display lists for static stuffs too.

I don’t think msvc doc is best for knowing about how OpenGL works. I think a good book or this website is best.

I already have done display lists using vertex arrays years ago. If your arrays don’t change (static), there’s no problem. Simply call the list not DrawArrays in order to render (obviously). But your display list description should make a call to DrawArrays.

You must compile a display list, so I wonder if there will have the same quickness than drawing with arrays.

Can you explain more you two or three last paragraphs ?

You can use vertex arrays during the creation of display lists, however, if you later change the bound array it will not change the contents of the display list - what you draw during the display list creation is what will always be drawn by the display list. So, you can’t use display lists as a mechanism for binding arrays or for sending changing geometry data.
Display lists are very fast (in my experience) for static data, but they lose out compared to vbos if you have to update them frequently.

sorry if it wasn’t obvious, but what i would like to be able to compile into the display list is basicly so:

glBindBufferARB();
glVertexPointer();
glBindBufferARB();
glTexCoordPointer();
glBindBufferARB();
glDrawElements();

though i don’t expect it to be possible anytime soon.

with VBO, all the arguments are constants, which i figure could be compiled into a display list easilly as long as the list is in video memory.

the problem… or the reason i figure this isn’t possible. is because the cases where the VBO might be in agp, or when VBO is not being used at all (ie. classic glDrawElements) would complicate matters more than maybe driver maintainers are willing to put up with.

so just to clear things up… if this were possible, the display list would be 6 opcodes long. then this could sequence could be dispatched with a single call to glCallLists, and then the whole sequence would be executed on the hardware side… i figure at least that is how display lists work from the hardware’s perspective.

in other words if display lists could do this, they would be much more useful, and hence better earn there spot in the API.

i use them for bitmap fonts, but nothing else i can think of because their repetoire is too limited.

what do you think about nested display lists?

one layer could contain body of object (containing only glBegin/glEnd pairs with Vertex/Normal/Material definition, and other layer could be made of small display lists containing function calls for defining motion (glRotate, glTranslate, glMultMatrix, etc…) and calls for drawing the body of object

so in order to move part of object, you would have to redefine only portion of display lists (one that is in second layer).

I have done this recently, and I’ve tried to describe it in one of my posts here. However, I am not certain that this is optimum solution, allthrough it sounds good.

There is possibilty that frequent changing of small display lists will result with video card memory fragmentation. :frowning:

With this technique I’ve been able to get over 120k triangles per scene at 20fps, with blending, texturing, fog and full lighting.

I’ve been using low-end ATI card that is over two years old.

Since I’m a only beginer in OpenGL waters, I can’t say if this is success or not… I’m hoping to get some comment from you.

Cheers,
Vlada

Originally posted by vlada:
[b]what do you think about nested display lists?

one layer could contain body of object (containing only glBegin/glEnd pairs with Vertex/Normal/Material definition, and other layer could be made of small display lists containing function calls for defining motion (glRotate, glTranslate, glMultMatrix, etc…) and calls for drawing the body of object

so in order to move part of object, you would have to redefine only portion of display lists (one that is in second layer).

I have done this recently, and I’ve tried to describe it in one of my posts here. However, I am not certain that this is optimum solution, allthrough it sounds good.

There is possibilty that frequent changing of small display lists will result with video card memory fragmentation. :frowning:

With this technique I’ve been able to get over 120k triangles per scene at 20fps, with blending, texturing, fog and full lighting.

I’ve been using low-end ATI card that is over two years old.

Since I’m a only beginer in OpenGL waters, I can’t say if this is success or not… I’m hoping to get some comment from you.

Cheers,
Vlada[/b]
i guess you are asking me. i’m no expert on display lists… but i was intending to do something similar to that actually, but not really.

if you want what i think of display lists ( i could very likely be wrong ) … they aren’t good for very much when it comes to serious applications as it stands. they are useful maybe when you want to draw something very small and quirky with gl type transforms and flat shading or something. but not for much else right now.

if you are wanting to draw a big object, you need to read up on glDrawElements et all, and probably eventually look into the VBO extension when you are ready to work with extensions if you are not already.

i was personally intending to use two lists per object. the first list would be a MultMatrix with a floating point matrix embedded in it, and a VBO binding sequence. the second list i was planning would be a glDrawElements call. for my app it would be useful to be able to change glDrawElements indapendantly. but you are right, that if nesting is implimented effeciently i could just maybe tack the second onto the first inorder to cut the size of the glDrawLists string in half.

but unfortunately display lists aren’t good for DrawElements right now… well it may work by transforming the call to a sequence of glVertex commands, but apparently it won’t encode glVertexPointer etc. they would be a lot more useful if they could, by allowing command sequences to be stored in video memory, so that the commands did not have to be sent across agp and touched by the driver.

as far as memory fragmentation, i would recommend always regenerating display lists with the same sequence of commands only with different arguments. if you mix it up, it would enevitably cause some kind of fragmentation… then you would have to count on the driver to manage your memory more than necesarry, and personally i always do my memory management my self when possible… but that is just me.

hope that was helpful.

the problem these days… or at least the impression i get from different sources, is a single gl API call can require as much downtime as managing 1000 plus triangles asynchronously. this effectively punishes you for wanting to draw an object with fewer than 1000 triangles, like say a simple rock. so rather than doing a 1000 triangle rock, you may as do a 1000 triangle boulder. it kind of warps the way people aproach building scenes. the only real way to rectify this is to speed up the bus (agp etc)… but an expanded display list repetoir could also help in this vein a bit in many cases.
still though, judging by the general complexity of geometry i see in video game adverts… i don’t think this is a big deal right now were moderatly complex shaders are at work. but there appears to be a trend forming in this direction.

…or possibly draw lots of rocks with one draw call.
But no, michael, don’t tell me - I missed your point didn’t I? Like everybody misses your point.

glBindBufferARB();
glVertexPointer();
glBindBufferARB();
glTexCoordPointer();
glBindBufferARB();
glDrawElements();
Well, all of the glPointer and glBindBuffer calls are mostly CPU stuff. The glPointer needs to check to see if the buffer is where it needs to be and provoke some state changes in the hardware. And the BindBuffer just changes the currently bound buffer state; it doesn’t really do anything.

As such, display lists won’t help in this regard. Indeed, it is reasonable to assume that putting this in a list will be slower. Display lists that involve significant driver intervention are going to be slower than just making the function call, because the driver now has to interpret some kind of display list “language” and call the right functions with the right parameters.

A display list is fast when the functions called within it are purely hardware functions; things that directly set something into the graphics chip’s pipeline. If they have to go through any real complexity in terms of CPU driver code, it’s going to be slower than calling the right function directly.

Originally posted by Korval:
[b] [quote]glBindBufferARB();
glVertexPointer();
glBindBufferARB();
glTexCoordPointer();
glBindBufferARB();
glDrawElements();
Well, all of the glPointer and glBindBuffer calls are mostly CPU stuff. The glPointer needs to check to see if the buffer is where it needs to be and provoke some state changes in the hardware. And the BindBuffer just changes the currently bound buffer state; it doesn’t really do anything.

As such, display lists won’t help in this regard. Indeed, it is reasonable to assume that putting this in a list will be slower. Display lists that involve significant driver intervention are going to be slower than just making the function call, because the driver now has to interpret some kind of display list “language” and call the right functions with the right parameters.

A display list is fast when the functions called within it are purely hardware functions; things that directly set something into the graphics chip’s pipeline. If they have to go through any real complexity in terms of CPU driver code, it’s going to be slower than calling the right function directly.[/b][/QUOTE]yeah, like i said i realize all of this.

however, there is no reason why hardware accessing a video buffer can ever cause a security issue, unless maybe the memory was being shared by a proprietary application that doesn’t want its geometry being accessed by another context… but current context boundary registers could be maintaned in hardware to prevent this.

i assume hardware must have the capacity to decode display lists on its own… if display lists are to be reasonably efficient. i don’t see why hardware could not decode the sequence above. as long as the constant arguments don’t go out of the current context’s memory domain, then everything should be legal. if you render garbage, that is the programmers fault.

as for knackered i think it was… you can do small rocks in a single unified buffer if they can never move. but the second one moves, you either have to break it out of the static buffer, by at least obscurring its indices, then draw it seperately so you can apply a transform to it. then finally i guess after it reached an equilibrium you would overwrite its vertices, and reinstall it into the static buffer. this all might work fairly well, but its still no reason why it wouldn’t be useful to be able to access video memory buffers with display lists.

glVertexPointer and friends are not compiled into display lists.
glBindBuffer is not compiled into display lists.

GL core spec 1.5, section 5.4 (“Display lists”)
A display list is simply a group of GL commands and arguments that has been
stored for subsequent execution.
<…>
The only exception pertains to commands that rely upon client
state
. When such a command is accumulated into the display list (that is, when
issued, not when executed), the client state in effect at that time applies to the command.
Only server state is affected when the command is executed. As always,
pointers which are passed as arguments to commands are dereferenced when the
command is issued. (Vertex array pointers are dereferenced when the commands
ArrayElement, DrawArrays, DrawElements, or DrawRangeElements are accumulated
into a display list.)

Issues section of ARB_vbo spec
[b]Which commands are compiled into display lists?

    RESOLVED: [i]None of the commands in this extension are compiled
    into display lists[/i].  The reasoning is that the server may not
    have up-to-date buffer bindings, since BindBuffer is a client
    command.
    [i]Just as without this extension, vertex data is dereferenced
    when ArrayElement, etc. are compiled into a display list[/i].[/b]

To sum it up, what you’re trying to do here is specified not to work.

Rule 1: display lists are immutable.

Rule 2: everything vertex array related is “client state”, it never ends up in a list.

Rule #2 can be seen as a means to “protect” rule #1 from being broken by display lists. There are good reasons to do so, from an implementation complexity pov. Not having rule 2 would encourage usage profiles that just don’t perform sensibly.

As a result, you can not only not change the command sequence inside a display list, but also there’s no way to change what a display list does (without deleting it and compiling a new one). This automatically excludes being able to affect a display list via “side effects” such as changing buffer bindings and pointers.

To paraphrase the spec, if you dereference a vertex array while compiling a display list (by calling glArrayElement, glDrawArrays, glDrawElements et al), the vertex data at the time of the call is copied into the display list.

i assume hardware must have the capacity to decode display lists on its own… if display lists are to be reasonably efficient. i don’t see why hardware could not decode the sequence above. as long as the constant arguments don’t go out of the current context’s memory domain, then everything should be legal.
It doesn’t work that way. Or, more specifically, it doesn’t have to and often doesn’t.

gl*Pointer when a VBO is bound needs to do more stuff than just hardware is capable of. If a buffer has been paged out to system memory due to other needs, the driver needs to upload it back to wherever it should go. This is not something hardware can go do by itself; it involves system memory and therefore requires driver intervention.

Also, an intelligent driver needs to keep tabs on how often VBO’s are used, so that it can keep track of more frequently used buffer and less frequently used ones.

Plus, there’s the issues that Zeckensack pointed out. Which makes this discussion academic :wink:

Originally posted by michagl:

as for knackered i think it was… you can do small rocks in a single unified buffer if they can never move. but the second one moves, you either have to break it out of the static buffer, by at least obscurring its indices, then draw it seperately so you can apply a transform to it. then finally i guess after it reached an equilibrium you would overwrite its vertices, and reinstall it into the static buffer. this all might work fairly well, but its still no reason why it wouldn’t be useful to be able to access video memory buffers with display lists.

Turns out there’s a much simpler solution. All modern video cards have the ability to program the vertex transform pipeline with what in OpenGL language is called vertex programs. These programs can either be written in an assembly like language or in a high level c like language.

To draw the boulders with independent transforms in a single call do the following:

  • Create a single vertex buffer containing the vertices of all the boulders. In addition to your normal vertex attributes like position, color, texcoords etc you also add an integer specifying the index of the boulder the vertex belongs to.

  • As well as programming the pipeline you also have the ability to upload constants to the GPU to be used by the vertex program. Upload the array of boulder transforms to the GPU.

  • Finally you use the boulder index in your vertex to access the correct transform in the vertex program.

If you want to temporarily hide a single boulder you can just set it’s transform to a 4 x 4 matrix of zeros.

/A.B.

yeah, of course it is ‘acedemic’, in the sense that this obviously won’t work now, but is not necesarrilly beyond specifications.

first off, everyone seems to draw the conclusion that i’m suggesting a ‘mutable’ display list. just to clear that i’m, i’m definately not. the display list would not at all check client states, and the bound buffer and ‘pointers’ (offsets) would be static.

glEnableClientState would not be compilable. so the language does not even conflict. i assume ‘client states’ are for context switching between unrelated applications.

glEnableClientState would not be compilable. so there is nothing semanticly wrong in the command names themselves. maybe glVertexPointer uses ‘client’ terminology in the specifications, but specifications are changeable as long as backwards compatibility is maintained.

so if it is understood that with VBO the result of BindBuffer and VertexPointer is an absolute address, then the resulting command could be compiled in a display list element to reside in server side memory, which would look like glVertexPointer with an absolute address and bounds.

if you deleted that buffer though, the display list would still point to that absolute memory address. hince it is static server side.

i’m sure anyhow that it would almost completely unlikely never happen in at least this form, but it would be a very handy functionality performance wise.

if a driver needed to page the memory, then i’m sure it keeps track of what buffers are paged, so that it could strip those lists while compiling them. i’m assuming right now that video memory is never read back from the card for paging. if that was possible the implimentation would be a little bit trickier.

i doubt the performance would be worse than immediate mode if the buffers and lists were paged with a reasonable implimentation.

so like i said, i don’t expect anything, but it would be cool… and could fill in a gap were the bus might fall behind the rasterizer.

responding is essentially ‘acedemic’, so do so at your own peril.

thanks brinck… personally if i was going to do that, i wouldn’t want to use a dedicated shader, and i don’t think it would be worth while to do so. there would be a lot of constant matrices to upload per buffer, more than is probably realistic for shader’s constant memory limits.

you would have to assume that only the smallest fraction of say a few thousand rocks are likely moving at a time. the rest would share a single stationary transform.

the point of masking a rock from the static buffer would be to not need to use a transform. that way you could render the stationary rocks in a single pass, and treat the mobile ones individually. after the mobile rocks stop moving, you would recompute their vertices in relationship to the static buffer, and slip them back into the buffer when the time is appropriate.

if i was going to mask a rock, assuming i could retrive its indices, i would probably have its first incice set to primitive restart, then set all of its vertices to its first vertex, which would make it disappear, assuming you are using triangle strips… though i’m not sure how effecient that would be for hardware. i understand it doesn’t like degenerates for some reason… it might be faster to set all of its vertices to zero, but that would write over more memory, so it might be slower.

correction: i would definately set the vertices to zero probably, because they are going to have to be written over again anyhow.

if you wanted to do some kind of cataclysmic event with a bunch of rocks though… that might be a different story, and tricky in general logisticly for the gpu.

edit: there is an interesting particle system shader done entirely on the gpu if you are willing to just do your simulation against a planar heightfield. i believe it stores its physical data for each point particle in a texture map. each frame i guess it uses a pixel buffer to update map by colliding it against a heightfield also in texture format. the demo looks like a volcanoe, i believe i saw a screenshot and summary somewhere in nvidia’s online material… maybe in an sdk i someone download for me… i’m not sure.

Michael, I think you missed the point as usual.

For arguments sake, say we’re drawing 1024 boulders.

Even the most low end cards are guaranteed to have at least 96 constants, assuming you use a 3 x 4 transform you can draw 32 boulders in one call.

The overhead of uploading the transforms and doing 32 glDrawElements compared to drawing the static buffer with all 1024 boulders is next to nothing (If you really want to pursue this subject I could pretty easily code up a test app and post the results on this board).

Even if all the boulders are static and are never meant to be moved this is a better solution since it requires 32 times less memory for the vertices (minus the extra vertex attribute for the boulder index).

I think you know you’re wrong but are to proud to admit it, much like some academics who continue to pursuit their inferior solution to a problem even after a much better solution has been presented.

I think you’ll have a hard time finding anyone on this board or anywhere else that thinks you’re right.

Btw, I’ve worked in the game industry for a number of years. For my last game released, Midtown Madness 3 for the xbox, I wrote the city renderer which displayed thousands of objects each frame so I think I now what I’m talking about…

About the particle systems, glad to see you’re finally catching up with the programmable gpu stuff. I attended Lutz Latta’s original presentation on the subject at last years GDC. The simulation is not limited to a heighfield, with a volume texture you can simulate collision against most anything.

/A.B.

this is really not the point of this thread.

Originally posted by brinck:
[b]Michael, I think you missed the point as usual.

For arguments sake, say we’re drawing 1024 boulders.

Even the most low end cards are guaranteed to have at least 96 constants, assuming you use a 3 x 4 transform you can draw 32 boulders in one call.[/b]
sorry, i’m thinking pebbles, not boulders… the question is one of batch processing. and like i said, if i was goint to do this, i wouldn’t use a dedicated shader… i would prefer to be able to use general purpose shaders. personally i’m not really a game programmer… i think more general purpose (robust) … its just a matter of perspecitive.

The overhead of uploading the transforms and doing 32 glDrawElements compared to drawing the static buffer with all 1024 boulders is next to nothing (If you really want to pursue this subject I could pretty easily code up a test app and post the results on this board).

the reason it is slow to do 32 DrawElements, is because each pebble would contain too few vertices to justify its own batch… even in groups of 32 or whatever. its also not worth keeping transforms handy for every entity assuming most are in equilibrium.

boy this is a waste of time subject.

Even if all the boulders are static and are never meant to be moved this is a better solution since it requires 32 times less memory for the vertices (minus the extra vertex attribute for the boulder index).
i don’t see how you reach this conclusion, as you are adding a significant memory overhead for transformas, and i can only imagine that i would save memory if you are thinking something like every pebble would use the same instanced mesh.

I think you know you’re wrong but are to proud to admit it, much like some academics who continue to pursuit their inferior solution to a problem even after a much better solution has been presented.

i don’t know enough to know i’m wrong… i don’t have time to dedicate so much thought to a hypothetical matter. i just tried to phrase an argument into something you game heads might be able to get your head around.

my only point is it would nice to be able to setup small VBO batches without going through the driver/agp.

I think you’ll have a hard time finding anyone on this board or anywhere else that thinks you’re right.
thats just people in general, they tend to only agree on rote learning… critical thought, or empathy is asking too much.

Btw, I’ve worked in the game industry for a number of years. For my last game released, Midtown Madness 3 for the xbox, I wrote the city renderer which displayed thousands of objects each frame so I think I now what I’m talking about…
good for you. i never said you didn’t know what you were talking about. you just weren’t talking about what i was talking about.

nothing personal, but if you want to impress me (no harm in that) you have to be more specific than ‘thousands of objects’.

About the particle systems, glad to see you’re finally catching up with the programmable gpu stuff. I attended Lutz Latta’s original presentation on the subject at last years GDC. The simulation is not limited to a heighfield, with a volume texture you can simulate collision against most anything.

what do you mean finally? i’ve been working with shaders since 2001. and yes you can collide against anything as long as you can fit it in the shader… but a heightfield is about the most interesting thing you can fit in the shader.

edit: beg your pardon, ‘volume texture’ must’ve landed in my blind spot. not really sure what a ‘volume texture’ is though… i’m assuming its radial and convex though.

for what its worth, personally i don’t allocate time for ‘toy’ systems. by ‘toy’ i just mean limited scope. i would take a modest performance hit taylor a system too tightly… and toy systems never scale in the end.

Originally posted by michagl:

boy this is a waste of time subject.

Yes, I’ll let you have the last word and leave it at that.

/A.B.

P.S. A volume texture is simply a three dimensional texture. If you’re interested in particle systems I can outline how to use volume textures to collide against arbitrary geometry.

Originally posted by michagl:
boy this is a waste of time subject.

for what its worth, personally i don’t allocate time for ‘toy’ systems. by ‘toy’ i just mean limited scope.

…time is a running theme in your essays, michagl.
You seem to have had plenty of time to waffle on in these forums recently, about things even more trivial than this. I’ve wasted plenty of my time reading your stuff, only to slowly realise you haven’t the first idea what you’re talking about. Now I feel sorry for you.
Don’t worry, you’ve definitely earnt my pity.

Originally posted by brinck:
[b] [quote]Originally posted by michagl:

boy this is a waste of time subject.

Yes, I’ll let you have the last word and leave it at that.

/A.B.

P.S. A volume texture is simply a three dimensional texture. If you’re interested in particle systems I can outline how to use volume textures to collide against arbitrary geometry.[/b][/QUOTE]i really apreciate the offer, but right now typical high density particle systems are not on my radar.

personally i don’t see how the volume texture aproach could be very useful beyond extremely localized and very ‘niche’ circumstances. i would love to hear a litany of cases were such an aproach would be utilitarian, because i really can’t think of any for the life of me. as for 3d a volume texture, that sounds pretty opulant… is there any kind of compression envolved in that?

just for a little background, i’m really a fan of run-time configurable systems. if i was going to impliment this within the framework of such a system. the only way i could imagine to do it would be to expose the entire shader system (source and bindings) at run time so users could contrive their own little ‘toy’ effects.
its basicly like dynamic lisp versus compiled c++. most people who really know the playing field will tell you that lisp is ultimately superior for all of its run-time flexibility… but of course that flexibility comes with a hopefully minimal raw performance hit. but it generally well accounts for itself by the end of the day, in terms of turn around, substainability, and dynamic effects.

so basicly that is the same way i approach graphics personally. i try to think beyond the ‘demo’ or ‘toy’ game, towards a unifable, scalable, immortal systems, rather than geting a product out on the market by the next quarter. i have no investments in that scene.

Originally posted by knackered:
[quote]Originally posted by michagl:
boy this is a waste of time subject.

for what its worth, personally i don’t allocate time for ‘toy’ systems. by ‘toy’ i just mean limited scope.

…time is a running theme in your essays, michagl.
You seem to have had plenty of time to waffle on in these forums recently, about things even more trivial than this. I’ve wasted plenty of my time reading your stuff, only to slowly realise you haven’t the first idea what you’re talking about. Now I feel sorry for you.
Don’t worry, you’ve definitely earnt my pity.
[/QUOTE]i know very well what i talk about, otherwise i would not talk. we just think through different paradigms… interact with teh world through different lenses. neither is necesarrilly superior… its just a matter of priorities.

PS: i don’t waffle… sorry if you can’t follow, but i have no regrets here.