I am doing some work that involves using the graphics card’s 4x4 matrix multiplier. I did some experimenting and found that it took me about 200ms to do 1,000,000 4x4 matrix multiplications in software and only about 80ms to do 1,000,000 4x4 matrix multiplications in hardware using glMultMatrixf, which as I understand it also involves sending the information to the graphics card.
My problem comes in that I’ve found that doing a glGetFloatv(GL_MODELVIEW_MATRIX, x) 1,000,000 times in a row takes around 12,400ms on my machine. I did some reading and I saw that glGet* is notoriously slow, because it has to wait for the command queue to finish or something. I would have figured that doing all of them in a row it wouldn’t have been this bad, but it still is.
I’ve been reading up on V/PBOs in hopes that maybe they might yield some advantage in bulk data transfer, because one of the articles I read said that “any” function that took a pointer now takes and offset into the bound buffer. Obviously, I thought maybe glGeting into on card memory would be a faster operation and I could send it all back at the end. Firstly, after more reading I’m thinking that the “any” was an exaggeration, and secondly even if it works, glGeting into on card memory may not actually be any faster than retrieving it in main memory on my machine.
So does anyone know of any alternative that I can use to inspect the current modelview/projection/texture/color or whatever matrix that doesn’t use glGet, since I doubt there is anyway to speed that up? Because doing the multiplications in the graphics hardware isn’t going to do me a lot of good if I can’t get the information back and inspect the results.
If it helps, I really only have to inspect the 14th element (0-indexing column major storage) of the matrix to get what I need.
Additionally, I know all the matrices I am going to need to multiply ahead of time, so if there is a way (which I would like to imagine there should be) all I really want to do is:
-
Send a buffer of n matrices to the graphics card (16 * n floats)
-
Multiply the ith matrix time what is currently in the card and inspect the result returning the 14th element or storing it in a buffer to be sent back in the end for all is from 0 - n.
I figure I may have to end up using shaders or something, but I was hoping there was a way to do it without that.
Any help would be appreciated