Originally posted by jwatte: I would recommend against trying to put N modelview matrices into a vertex program, and then sending more per-vertex data to select one.
Nothing wrong with batching in a vertex shader this way if you are instancing identical objects in many locations, especially if your model-world transforms aren’t constant.
A low priority thread should be scheduled by the OS not to interfere with CPU caching, otherwise what’s the point in having a multi-thread, multi-process operating system?
Just because you have more threads available doesn’t mean you’ll be more efficient.
If you touch “background” data in a “low priority” thread, that’s still work for the CPU. It still brings that data into cache, and TLB, and from disk, at some point, assuming all threads get time now and then.
The point is that you don’t NEED to bring those things in, because they’re not being used, so they’re not candidates for the optimization. Thus, you’re better off only checking for the optimization inside the render function of the object itself, because then you automatically only worry about data that you actually need to worry about.
Originally posted by pocketmoon: Nothing wrong with batching in a vertex shader this way if you are instancing identical objects in many locations, especially if your model-world transforms aren’t constant.
How exactly is the selection suppose to happen?
And are we talking about ARB or NV (1.0, 1.1, 2.0)?
If you batch a bunch of instances like that, you have the following problems:
you need to duplicate the triangle list N times for a batch of size N
you need to additionally stream down a matrix index per vertex
oh, wait, that means that you can’t actually re-use your vertices!
suddenly this idea isn’t so great
Even if you could re-use your vertices, you’d have the extra bandwidth of the matrix index. Meanwhile, uploading all those matrices in a big batch would probably consume as much bandwidth as uploading them one at a time.
I believe that if you’re limited inside LoadMatrix, you’re actually limited on per-batch setup overhead, and the only way forward would be to do software transform/aggregation, and lose the instancing.