What is the best way to render millions of billboarded quads?

For my project I need to render a ton of billboarded quads, and I haven’t really found a good way to do it. The centers of the quads do not change; the only thing that changes is their orientation as the camera moves about the screen. The problem is that although each quad’s center is constant, the positions of its 4 vertices are always changing. Here are a few of the solutions I’ve come up with:

Geometry shader: probably the best solution, but off the table for me since they’re not available in OpenGL ES 2.0. I would render the center of each quad as GL_POINTS, and then have the geometry shader compute the 4 corners of the quad: topLeft, topRight, bottomLeft, and bottomRight, and then assemble the quad from those.

Instancing: I would load 2 arrays to the GPU: the base vertices for a quad ({0, 1, 1, 0, …}) and the positions for each quad in the world. The latter would be an instanced array. In the vertex shader I would use the world position to compute where each corner of the quad goes. The problem is that this causes a lot of duplicate calculations, as I would be computing the same information 4 different times (for each vertex). Also, I’ve heard instancing very small objects can cause performance issues on some hardware.

Are there any other options as to what I can do? Am I missing something?

Using GL_POINTS may be an option, provided that GL_ALIASED_POINT_SIZE_RANGE is large enough.

There aren’t really many other options in ES 2.0. It doesn’t have instancing, you can’t realistically even do fake instancing due to the lack of gl_VertexID, integer attributes, large uniform arrays, etc.

About the only option is to duplicate the centre coordinate for each of the four vertices.

Note that even on implementations which do support instancing, rendering one quad per instance is inefficient because implementations don’t coalesce multiple instances into a workgroup, so most of the cores will be idle when executing the vertex shader.