Some misc. questions

Hey! I had some miscellaneous questions about optimising a project I’m working on and I was wondering if you guys (and girls!) could point me in the right direction on a few things.

First: with a vec2, which is quicker? a*a or pow(a,2)? In an ideal world they’d optimise to the same thing but I don’t have a whole lot of modern hardware to try it out on what’s relevant today.

Second: we need a quick way to discard the current fragment based on a threshold. Is it quicker to use alpha_test or an if statement and discard? This would also skip some other calculations early.

Third: it’s part of the project that the scene is entirely paletted flat colours (with lighting of course). We also intend to use a LOT of lights so have opted for the deferred approach. The intention is to use a 1d palette texture and only write a U into a single float buffer. Is there a format that’d be optimal for this given we’re using at max 30-40 colours? It’d be nice to be able to upload an unsigned char with the particle data and have it rendered out to an unsigned char buffer but it’s a bit unclear as to whether there might be a performance hit because the pipeline seems well optimised for floats.

Fourth: Is there a best format for storing normals in a buffer?

Fifth (and finally): For the lighting stage, it’s necessary to have the current pixel’s position. Would it be best to have a superexpensive RGBAF32 buffer to store the position (expensive in both memory and bandwidth) or is it cheap enough to determine the pixel position using the depth buffer and screen position that it just doesn’t matter?

Thanks for your time!

which is quicker? a*a or pow(a,2)? In an ideal world they’d optimise to the same thing

Then you answered your own question. In the best possible scenario, neither is faster than the other. In every other scenario, a*a is faster.

Second: we need a quick way to discard the current fragment based on a threshold. Is it quicker to use alpha_test or an if statement and discard? This would also skip some other calculations early.

Since your shader is going to execute one way or another, you may as well use discard.

it’s part of the project that the scene is entirely paletted flat colours

What does that mean? Are these the final colors to be displayed to the user? Are these colors the diffuse absorption characteristics of the surface? Do they have some other meaning defined by your lighting equation?

Is there a format that’d be optimal for this given we’re using at max 30-40 colours?

Are we talking about an image format or a buffer object format (UBOs) or what?

Fourth: Is there a best format for storing normals in a buffer?

Better than what? You didn’t say what you’re currently using, or what those normals might be. Or anything.

Deferred rendering is an optimization. And there is no one-size-fits-all approach to optimization. The choice of image formats for the g-buffers depends greatly on the needs of your rendering system. What your material parameters are, etc.

Would it be best to have a superexpensive RGBAF32 buffer to store the position (expensive in both memory and bandwidth) or is it cheap enough to determine the pixel position using the depth buffer and screen position that it just doesn’t matter?

That depends on how many deferred passes you have. If you render actual positions, this (theoretically) lowers the per-pass overhead, since you’re just reading from a texture rather than doing computations. The more passes you have, the more you save. One of the downsides is that you basically use up 3 output variables for something you could compute. This has significant performance considerations.

That being said, transforming from window-space back to camera space (or wherever you intend to do your lighting) is fairly quick. Indeed, depending on your bandwidth concerns, it may be faster than doing a texture read. A good scheduler/optimizer may even hide the latency of a few texture accesses within this computation.

If you’re focused on more modern hardware, I’d go with computing things manually, but I’d also do some tests to see which is faster. Deferred rendering is after all an optimization.

or what those normals might be

The normals captured during the scene render to the deferred rendering buffers. Is there a texture format which can store the -/+ values without packing/repacking or wasting a lot of texture space for something which realistically doesn’t need to be (massively) accurate?

The paletted colours are being used in place of standard diffuse/specular buffers to save on memory. If it’s just going to be one of a few flat colours, there’s no real point in storing a pair of full RGB colours if it’s going to be part of a limited range of values. This would be applied as a final render pass, multiplying the light accumulation buffer with a sample from the palette texture using the rendered buffer to choose where in that palette to sample.

Thanks for answering.

Is there a texture format which can store the -/+ values without packing/repacking or wasting a lot of texture space for something which realistically doesn’t need to be (massively) accurate?

First, what’s wrong with “packing/repacking”? That single instruction (MAD) will mean nothing compared to the bandwidth savings.

Second, you will have to use your own judgment to decide what kind of image format to use. Picking image formats for deferred rendering isn’t just about taking each field and making it small. It’s about taking all of the parameters and packing them into as few images and bytes as possible.

For example, in the general case, I would use an RGB10_A2 texture to store normals. However, in your case, since you have this color index that maps between a single value and an RGB lookup table, I would suggest using an RGBA8 texture. The RGB would store your normals, and the alpha would store your color index. You would get two parameters with a single image.

That’s a fantastic idea. Thank you again. I can’t believe I didn’t spot that!

YOU ROCK!

Also: can’t believe I never found that image formats page when searching around, I kept stumbling back upon the glTexImage2D page which is a bit… sparse?

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.