Optimizing rendering of particle systems

Hello, fellow programmers. I hoped you could help me with some advice for implementing particle systems in a game demo im doing as a school project. I plan to use particle systems for many effects, so it’s imperative that this implementation is speed-efficient (most… unusual priority, eh?).

I was planning on having a manager, that orders the individual systems into a tree hierarchy for rendering.
It would basically put all systems using the same texture in one branch, with sub-branches for each blending mode variation of the systems. With this approach, I figured that I could minimize the texture switching and state changes required to render all existing systems.

I have some issues that I would appreciate your thoughts, ideas or suggestions bout:

I had plans for animated particle textures (since I’ll use alot of fire, which should look good with animation, creating a very “volatile” effect), and figured out a good way of doing this without excessive texture switching would be to use the well-known method of “mosaicing” the different frames together on a single texture. Then I just modify the texcoord positions over time to reflect animation.
Since the particles are contained in separate systems, I thought it would make sense to implement each systems rendering as a vertex array. Is there any problem with that idea? What alternatives exists? And last, if I should use it, what primitives should I draw? Quad or triangles? I also figure drawArrays would be better for particles, since there isn’t a whole lot of vertex sharing going on? Correct?

Btw: I’m new to VBO’s, having heard of them just recently, I’ve just printed the Nvidia PDF detailing their useage, should make some decent night-reading. :wink:

Hm, jwatte also has some interesting ideas here:

Thanks for your time, looking forward to your replies.
Martin Persson, Sweden

building a vertexarray is the fastest solution i know of. whether using vbo or not is up to the performance gain you get while testing.
for animation you could easily use a 3d texture where the r value in the texture coords reflects the animation state. so shifting s and t coordinates is not needed for “mosaicing”.

there is a little app on delphi3d, which billboards particles i a vertex program, which could be faster than doing it on the cpu.

minimizing texture state changes is important, but when mixing different blending modes (additive blending, alpha blending) you have to sort the particles by distance from the viewer and render them back to front. in a generalized system there is a chance, that every second particle has another texture as a result of sorting.

to optimize the rendering is also the task of the artist or the designer of the particle emitters.

an idea i had, which i never implemented btw., was to create a global particle system map at runtime, where all single particle textures are merged together. the particle system will then generate correct texture coords for each particle according to the position of the particle texture in the global map.


facing the same task, thanks for the tip about 3d textures.
so far been putting my thoughts on paper first, now implementation waits.

would as well use a single vertex array to build rendered particles in it.

Originally posted by jabe:
there is a little app on delphi3d, which billboards particles i a vertex program, which could be faster than doing it on the cpu.
Agree, i’m 90% sure it would be faster on the CPU. On the other side, instead of using billboarded quads, you’d better go for point_sprites.

You may also pay attention in using an efficient structure to occlude your particle systems instead of storing them in a tree.

Just thoughts :wink: , regards,

Thanks for your suggestions! I’m a little resilient to using too advanced features, since our teacher in the graphics course has to run the demo, and presently, he has a dell laptop with a meager GF4MX420… However, we don’t have it as a requirement to be able to run it on that hardware, but it would make it easier at least… I’ll check wether pointsprites are supported on that platform or not.

Lookit at the rendering of particles. Could one have a fixed vertex array for each particle chunk with different textures, and process the particles during the “pseudo-rendering” to these vertex arrays for depth, and put them in order. Then for each array, you weight it depending on the average “depth count” for its particles. Then you render the arrays back-to-front based on the weight. It will make incorrect order for some particles, but the brute load would be rendered correctly, no?
It does however seems to be a huge CPU overhead here… I’ll probably just try the good old hackaround with glDepthMask and see how it looks.

Or does someone know of any good way for particle rendering, solving this issue?

Thanks for you help, much appreciated.
Martin Persson, Sweden

your weight based approach might be a good solution when particles from different emitters do not intersect each other that much. it is the same problem when sorting transparent objekts by its center point only.

the cpu overhead is not that much when sorting particles. have a look at radix sort, it does its job very well. i benchmarked it on a 1 GHz machine and it took 17 ms to sort 100.000 entries.

i would hold the particles with all the metadata in a seperate array. when sorting, just sort an array that entries are structs {distance, index}. while traversing the sorted array, copy or generate vertex positions and texture coords from the index-th particle into the vertex array used for rendering.

another approach is order independend transparency which uses “depth peeling”. there is an article on the nvidia page from cass everitt. but thats maybe too advanced :slight_smile:

The one and only most efficient way to render particles is by using a couple vertex program / VBO. This way, you render nay number of particles using a single draw call.
You can keep the parralelism up if you use 3D textures for animation (and you gain free linear interpolation between each keyframe). Texture Atlas has a lot of issues regarding filtering and mipmapping, you’ll prefer, for this specific case, 3D textures. (note that you could have an atlas of 3D textures …)

If you go further into vertex program tutorials, you’ll even find some way to do particle collision detection and response.

If you NEED true alpha blending, then there’s no other choice that sorting back to front particles, and at this point, using a vertex array (or mapping/unmapping your VBO index buffer). If additive blending can do the trick, then don’t bother with a CPU pass on each particle.

Remember that each particle is a separate quad (or pair of triangles), and that there’s no way to make a smart use of post TnL cache. Al contrario, you may have a very clever use of pre TnL cache if you align your billboards vertex and render them sequentially (would work the same with vertex array or VBO). CPU fallback will completely kill this optimisation.

Denpending on the forces you use in your particle system, you’ll maybe need some maths to perform the differential equations in the vertex program. If you don’t go too deep in point/surface/volume forces, you won’t need that, and you’ll find some easy way to implement this in a vertex program. If you intend to have multiple constant or dynamic wind force, gravity, and some attractive or repulsive zones, then you’ll want to fall back on CPU maths.

Abstract :

  • use VBO to store your particles,
  • use a vertex program to align your particles with the camera, to animate your particle position if the maths aren’t too hard for vertex programming,
  • use 3D textures to animate the texture of your particles.


Agree, i’m 90% sure it would be faster on the CPU. On the other side, instead of using billboarded quads, you’d better go for point_sprites.
Depends on your hardware I guess. I’m fairly confident on an NV30/Nv40 it’ll be faster on the gpu. Unless you’re using SSE or 3Dnow for building your vertex positions. Even then, why bother wasting cpu time to do it ?

Hmm, I was planning NOT to get into these new GPU effects for this project, but… Well, the sooner the better I guess… I’ll just have to check how much of it that is actually supported on my card, GF4 TI 4200 128Mb, seems in only supports some things in hardware.

However, regarding point sprites. They are drawn as points, I’ve understood it, yes? What I am concerned with is the size of the particles. I mean, for some effects, like flames, I might acctually need somewhat large particles… I’m not sure how large particles I can render with this method. I’ll fiddle around and see I guess. After all, I can query the maxPointSize…

Again, thanks for your help.

99% of 3d apps are CPU limited. This means that 99% of 3d apps make the GPU spend time idle waiting for more data. Just imagine that…while you’re calculating your quads, the GPU is sitting there waiting for the results, when it is 300x more proficient at doing the calculation in parallel to the CPU doing something that can’t be done on a GPU.

I’ve been spending a lot of time implementing particle systems for a game engine I’ve made, let me share my experiences with you.

  • Consider using alpha test. Particles consume a lot of fill rate and using alpha test is a way of reducing this cost.

  • Consider disabling Z writes, since you usually dont want particles to hide other objects or other particles.

  • Rendering the particles with quad/triangle arrays and billboarding is not trivial, since you have to do a matrix change for each particle (the solution here may be using point sprites).

  • Regarding animation, 3D textures are not supported by hardware in geforce MX cards, mosaics can cause problems when filtering border texels (I think there is no perfect solution here).

  • Unless you have a lot of particle systems changing the texture for each one is not a noticeable cost.

as that texture3d idea sounded so nice, looked into but also has issues with mipmapping, so will create a single baked image of the sequences

question about vertex program, someone mentioned even forces could be calculated for the particles, but would one not be able to get the result back ?

I mean vertex program can only output to “screen” basically, the data it would create cannot be retrieved, so one could only add a little noise to position/color… but not let it really calculate things you need in ram/vbo

Originally posted by martinho_:
- Rendering the particles with quad/triangle arrays and billboarding is not trivial, since you have to do a matrix change for each particle (the solution here may be using point sprites).
Say what? Looks like you’ve missed some obvious short cuts mate.

You definitely dont want to be doing a matrix change per particle! Per particle set is more sensible, with each particle just an offset from the root position of the particle set.

Hmm, rendering the particle systems on the GPU certainly seems like an attractive deal. However, I’m currently on a GF4200TI, and currently OpenGL Shading Language isn’t supported on it (have yet to acertain wether it’s driver och hardware that needs updating). I’d rather not write the implementation hardware-specific (ie, using CG), so I might fall back onto CPU rendered systems for this project.
I’ll head up to the university tomorrow in order to use their broadband in order to get those 10Mb-PDF’s detailing the OGL Shading Language support on current NV hardware.
Seems a hardware upgrade is in order for me… :slight_smile:

GLSL (for vertex shaders) should be enabled by using newer drivers, and failing that you can use ARB_vertex_program.

– Tom

shoot me if you like, but unless we are talking about millions of particles i wouldnt waste time on using vbo or even arrays unless you really ARE limited by that.

i dont know any typical numbers but simple immediate mode and a vertex program for billboarding handles 250k particles at 20fps on a rad9800. and the problem isnt drawing them, but updating them (in a rather primitive way). so i might be worried about how to update large numbers of particles more than drawing them.

basically its just:
begin quads
update particle
draw particle
end quads

and the program does nothing but
transform point
add size * corner sign

point sprites would be nice, though clipping is a problem AND ati STILL doesnt support them (ok, their driver claims to, but its as broken as can be).

bits and pieces:

the size is passed as tex coord, and the normal actually is the position, while vertex just specifies the corner (.5 instead of 1 as we just want half the size in each direction).

you can obviously save a few calls if they all have the same size and/or color

MOV pos, vertex.normal;
MOV pos.w, 1;


MOV result.position, res;
MAD result.position.xy, vertex.position, vertex.texcoord[0], res;
ADD result.texcoord[0], vertex.position, {.5,.5,0,0};
MOV result.color, vertex.color;

i admit the program could use a few ATTRIB lines to use less confusing names.

That´s great!
I was always wondering how i could do billboarding in a vp. That normal-as-position is really a nice trick.


I posted my demo up on these boards about doing it years ago, but searching through all the old posts I think it’s been lost in the transition to new server.

Demo with source doing particles in a VP here;

4th off bottom.

knackered: Say what? Looks like you’ve missed some obvious short cuts mate.

Nutty: You definitely dont want to be doing a matrix change per particle! Per particle set is more sensible, with each particle just an offset from the root position of the particle set.
A matrix change per set will work if the system is not big (in size), if not, you are not making real billbording since the position for each particle is different. As I said before point sprites are the solution for this (even if tried them and didn’t notice any performance increase). But if you say I’m missing some shortcuts tell me what ones, and I only accept the ones that produce the same results as making billboarding for each particle.