Particles as primitives

This is an idea I’ve been batting about for a while; can’t decide if it’s nifty or horrible. See what you think.


Particles are being used more and more as hardware triangle throughput increases. Lots of things (clouds, smoke etc) can’t feasibly be done any other way.

Particle systems are usually rendered in a standard way; as textured quads aligned with the screen.

Under current OpenGL this is horribly inefficent. For each particle you’re sending 4 vertices (3 floats each) + 4 texcoords (2 floats each) = 20 floats per particle. Added to this, the application will generally have to transform each particle position vector into eye space so that the quads are screen-aligned. With the CPU rapidly becoming the bottleneck in 3D this is a Bad Thing ™.


Introduce a new primitive type, GL_PARTICLES, alongside the existing ones. Each particle is specified by only 3 floats, its untransformed position vector.

OpenGL takes this vector and transforms it to eye space (with the possibility of GPU assist). It then generates the quad verts by just adding +/- 0.5 to the x and y coords of the transformed vert. (Alternatively, it could combine these translations with the eyespace transform, generating the four matrices at the start of the glBegin(GL_PARTICLES) block.) The size of each particle wouldn’t have to be 1x1; it could be set as global state (NOT per particle).

If texturing is enabled, each corner of the quad is automatically assigned a standard texcoord at the corresponding corner of the texture.

Overall you could save 16 floats per particle of bus bandwidth, a whole bunch of CPU calculations, and a fair bit of application complexity.


If we ever get 3D texturing in hardware, it would be fun to have a “glTexCel(GLfloat)” function which selects a “layer” of a 3D texture to which subsequent glTexCoord2 calls will refer. This would be nice for the above particle scheme because you could select a different image for each particle with only 1 float of memory bandwidth. As well as allowing systems of heterogenous particles, you could do animated particles and get frame interpolation for free from the texture filter.

Note that with a couple of stock textures (square and round) you could implement most of the existing GL_POINTS primitive type in terms of this one, and avoid the annoying problem of large points being culled as soon as their centres go offscreen. (Textured points would be a bit of a pain though.)

Also, note that the texture cel concept could give you very fast texfont rendering, with only one glVertex2 call and one glTexCel call per character.

Thoughts? Worth doing, or do we need another primitive type like a hole in the head?

Don’t froget ther’s an extension called EXT_point_parameters. I know this only draws a point, but that is more that sufficent in quite a few cases (sparks, lightpoints).

Well, the thing I really wanted to say is: In my engine, when I load a normal landscape, a few objects and a few particlesystems, I call nearly 1.5 million (1.500.000) OpenGL calls each frame, and still I can keep a framerate of 50+ fps on a 300MHz Celeron with TNT. What I mean is that I don’t think ther’s that mush performanceloss by calling functions. All you need to do is push and pop the arguments to/from the stack, and load a new instructionpointer. And remember, even if you don’t have to specify certain values (like texture coordinates in your case), they still has to be specified in some way to get into the pipeline.

However, I do think it would be cool with these features, so don’t get me wrong

Hi Bob,

Yeah, I’ve seen the point parameters extension, but a lot of the time you really do need textured particles. Also points are subject to implementation-specific size constraints, so using them for particle systems is a bit tricky. I know Quake2 used point_parameters for its particle effects, but Carmack switched to textured quads for Q3.

Re your performance comments - it’s not so much the function calls I’m trying to avoid - particle systems should probably be vertex arrays anyhow. It’s the amount of geometry data you’re bussing over to the graphics card every frame. Scene vertex counts are skyrocketing with the advent of T&L, and vertex data can choke your bus just as much as texture data. There’s an ARB working group looking into vertex array objects (like texture objects, cacheable in on-card memory) but particle system verts are often too dynamic to get much benefit from this. Hence this idea - send a minimal particle description over the bus, and expand it out to the full textured quad description on the card.

>Particle systems are usually rendered in a standard way; as textured quads aligned with the screen.

You can use 1 triangle insted of quad. (angles not always ==90° )

>the application will generally have to transform each particle position vector into eye space so that the quads are screen-aligned.

Some SGI h/w has GL_SGIX_sprite extension - it “provides support for viewpoint dependent alignment of geometry, in particular geometry that rotates about a point or a specified axis to face the eye point”.
I think, this extension is quite usefull for a lot of things - particles, trees, clouds, smoke, etc…

But new special primitive type for particles can be much more efficient.

btw, maybe somebody know how it work in Playstation2?

(from Playstation2 spec.)
- Particle Drawing Rate : 150 Million /sec

Particle with one size and axis aligned is ok for very basic particle system. But a lot of effect need rotation/scale animation to produce a good effect.
So after doing a particle system, I’m thinking:

-You must have sprite rotation to make good smoke effect.
-Each particle must have it’s own size to make scale animation.
-Many time, it will be usefull to stretch the
particle to have a high speed feeling.


Good points.

The cel-animation bit was partly intended to allow rotation and other sorts of variation, but wouldn’t really be feasible for extreme scaling and/or stretching.

The scaling could maybe be addressed by another per-particle param analagous to the point_parameters extension, but I can’t see a solution to the stretching case. I guess you’d have to fall back to doing it the traditional way. Still, I think there’s enough cases where the view-aligned quads would work for the efficiency gains to justify an extension.