Originally posted by Thomas Harte:
[b]- what is the cost of locking? I know that this causes the T&L to occur, but is it common that this is just a simple matter of passing the arguments to the card which already knows the original vertex locations, or is it at this stage that vertices are passed to the card, with a little ‘do these, please’ command?
- is there any large cost associated with unlocking? Or is it normally just an opportunity to clear up a few variables on the CPU (as opposed to the graphics card) side?
When using locked vertexarrays, the driver has several paths it can take:
- It can map the user buffer through AGP and have the card consuming the data pulling it via AGP.
- It can copy the data to a system memory AGP pool and have the card consuming it through AGP in a similar way as in 1 (with the difference that it avoids having to map the buffer through AGP which may be an expensive operation if the buffer is too small).
- It can copy the data to a video memory pool and have the card consuming it through local video transfers.
- It can discard the lock/unlock hint and just act as if it was a normal (non-locked) vertexarray.
Note that locking doesn’t “cause the T & L to occur”, it just modifies the way vertices will be transferred to the graphics chip at rendering time. You cannot just get your vertexarray and transform it at locktime because even if the buffer is locked, you can still change your modelview matrix (or texture matrix), which will produce different projected vertices.
On each of these scenarios, the lock & unlock will behave differently. The scenario to choose will mainly depend on the size of the vertexarray: for small vertexarrays copying the data to a driver pool may be faster than mapping the user buffer through AGP. Obviously all this gets more complicated if you take into account that you may run out of memory in the driver pools (which will force you to do some kind of synchronisation with the graphics chip) or that some arrays will be locked (for example geometry) and some others won’t (for example texture coordinates), so you may need a mixture of scenarios).
For scenario 1 the lock time is more or less constant disregarding the size of the buffer (it’s just locking down the buffer and mapping it through AGP).
For scenario 2 and 3 the lock time will depend on the size of the vertexarray, as it has to copy the vertexarray to the driver memory pool.
For scenario 1, it forces the driver to ensure the graphics chip has consumed all the vertex data before returning to the app (sort of a glFinish), as the user may modify the buffer as soon as it is unlocked.
For the rest of scenarios, there’s no synchronisation neede.
At rendering time, scenario 3 will be the best performer (transfers of vertices from video memory are really fast), but you may not notice the differences between scenarios if your bottleneck is elsewhere in your pipeline (T&L transformation limited, fill-rate or texture-filter-rate limited, etc).
[This message has been edited by evanGLizr (edited 08-29-2002).]