I’ve written a basic particle system. I don’t need a GPU based one, but I’d like to check that there aren’t some obvious improvements to what I’ve implemented.
Each frame I fill an array with a quad for each particle i.e. 4 vertices which each have; Pos (3 floats), Tex Coords (2 floats) and Color (4 floats). I then call glBufferSubData to put this into an existing GL_WRITE_ONLY VBO which I have created to be big enough to hold the max number of particles
I then render the VBO of particles with glDrawArrays GL_QUADS.
Each particle and be a different size and colour and graphics come from a single texture sprite sheet of particles.
Any improvements I could make?
You can save per particle data (less data to upload) by using a geometry shader to generate the 4 particle vertices from a single position and color. Make position a vec4 and you can use the w coordinate as an index into a texture atlas (if you need that).
I don’t know how much of a performance improvement this can yield, but since you avoid a bit of CPU computation (no need to figure out the 4 corners of the quad) and PCIe bandwith it may help.
Using two (or more) VBOs to avoid stalling the pipeline (ideally you want the previous render and the next upload to be parallel) may help too.
If you have no geom shader/or too slow, you can still use a vertex shader : send 4 vertices with same position, but each with a different screen-space direction to inflate it within the vertex shader.
Zbuffer and neumann, isn’t it possible that the shader switch will take up more time than the particles he wants to render would take without the switch?
This is one of the dilemmas I am trying to resolve right now. When do you adapt an existing shader to render something and when you write a new shader to render something. Maybe you write a shader for everything at first and then coalesce them together at a later time as needed.
Well to be honest, the original poster did not reveal how many particles are drawn.
Of course for such low-level tuning, benchmarks must be done to verify if any tangible gains exists. A shader based approach can free CPU work, so in the end “it depends”.
Thanks for the replys. Want to keep running on older cards, so Geom Shader is out, but I’ll try the vertex shader approach and do some benchmarking. I’ve only just got it to the “it seems to run” and haven’t had time to do any real testing.