I didn’t try using gDEBugger yet, but I’ll look into it :). I did some profiling using timers (timer queries and normal timers), and found that:
- Culling/tiling lights takes 0 (beyond resolution) to 2 ms. I tried various tile sizes (16x16, 32x32, 64x64, 128x128), and I get pretty much the same times.
- Rendering lights takes 8 ms on GPU side
- The rendering on the CPU side takes 10 - 40 ms (unacceptable!!!)
Lights are batched in groups of 1-16 (depending on how many are on a tile). I tried 32 once, but it just crashed (too many uniforms). Eventually, I will probably query this to make the batches as large as the particular machine can handle. If no lights are in a tile, the tile is just ignored.
So the issue is in setting up the uniforms, since that is pretty much all the CPU does when rendering the lights. It loops though all the tiles, sets the uniforms, and draws a quad for each. The quad drawing is fast.
For array uniforms, I just accumulated values for each pass in an array and then set the uniform array at the end. While there are 2 API calls in this version (setting array and giving the number of lights used in the pass), it runs as bad as 12 fps when a lot of lights are in the view. Using glLightfv dropped as low as 30. The non-tiled version never really dropped below 60 (this is all with about 150 lights), but I am running this on a pretty hi spec machine.
Since UBOs bind so quickly, I tried using an array of them and having each light keep its own UBO that it binds when rendering the tile. Almost all of the lights are static, so they don’t even need to be updated that often.
I set the uniforms like this:
void Shader::SetShaderParameter4fv(const std::string &name, const std::vector<float> ¶ms)
{
int paramLoc;
std::unordered_map<std::string, int>::iterator it = m_attributeLocations.find(name);
if(it == m_attributeLocations.end())
m_attributeLocations[name] = paramLoc = glGetUniformLocationARB(m_progID, name.c_str());
else
paramLoc = it->second;
// If location was not found
#ifdef DEBUG
if(paramLoc == -1)
std::cerr << "Could not find the uniform " << name << "!" << std::endl;
else
glUniform4fvARB(paramLoc, params.size(), ¶ms[0]);
#else
glUniform4fvARB(paramLoc, params.size(), ¶ms[0]);
#endif
}
Only the array uniform runs slowly, the others are fine. The only thing different between the array and single value forms is the glUniform…ARB(…) call.
EDIT 1: I found that if I purposely error out the shader, it shows the warning “warning(#312) uniform block with instance is supported in GLSL 1.5”. However, if I request #version 150, it complains about gl_TexCoord, and still fails to validate the program. Using #version 400 gets rid of all warnings, but it again fails to validate.
EDIT 2: I tried TBO’s as well, since they allow me to submit ALL lights at once :)! However, I ran into the same problem I did with UBOs: The shader does not validate. What could be the cause of this?