We have a web based product that currently uses Anti Grain Geometry (AGG) to render Google Map tiles. It takes some complex input shapes, renders it, compresses it and sends back to client as PNG.
Currently we render most tiles offline but we want to move to fully live/interactive on demand rendering. As you can imagine latency is important with AJAX web services so we want to optimise the tile render as much as possible.
Current test have shown that the AGG render takes more time than everything else (SQL data lookup, sorting, clipping, PNG compression and HTTP server). So want to speed it up. AGG is the (second) fastest CPU 2D anti aliased vector render I know of. (The fastest is only 1.5x faster). Now we have thrown everything at AGG including Intel compilers, SSE, multi threading etc but it is still not fast enough. Put simply, the latest generation Intel Nehelam server CPUs are too slow at rendering.
We recently got access to an Amazon EC2 machine for CUDA stuff, which got me wondering can we use the two Tesla C2050 cards to render faster than the CPU? Our first attempt was to port some scanline antialiasing algorithims to CUDA but it was not fast as CUDA is not suited to such a task.
So I am now turning the OpenGL so I can use the Tesla’s fixed function vertex units etc. to accelerate the render. (Yes Telsa’s work fine with OpenGL, just that you have render to offscreen buffer and read it back to CPU otherwise you can’t see your output as their is no VGA output).
I have been trying to see what existing GPU AA techniques exist as so far none them are terribly good.
First up, MSAA and/or CSAA is just horrible. The quality is the worst I have seen as far as antialias goes (compared to the CPU not other GPUs). The problem is that with very thin lines (<1px), and small detail geometry which is important when rendering maps (e.g road outlines) the sample mask ‘misses’ some of the pixels and you get a stipple effect. At very shallow vertex edge angles (compared to vert/horz pixel edges) the quality of the AA pattern is very low and blurry.
I next looked at Microsoft’s Direct2D renderer. I have used it before in a desktop app and the quality of the AA is good and it seems fast. Unfortunately it is not available on Linux (which all our servers run) so I cannot use it directly. I decided to reverse engineer it with the Microsoft PIX Debugger to see what tricks it uses and if it is applicable to my project using OpenGL. I was rather disappointed at what I found. Direct2D precomputes all alpha coverage values on the CPU!
It generates the geometry with thin 1px wide border strips in the areas of AA blend. It then uses the vertex shader to create 0.0f to 1.0f varying outputs along these border strips. Thus the hardware calculates the alpha strips coverage values. The pixel shader simply takes the varying and multiplies it with the shapes color (or look up texture if it is a gradient) to get the final AA result (with correct alpha value of course). It then uses the fixed funtion alpha blend for the final blend to buffer.
While this is fast and simple (and works on D3D 9 class hardware coughIntelcough which is very attractive to Microsoft) it uses way to much CPU processing to generate the alpha strip geometry. (even the DirectWrite sub pixel accuracy text is rendered to a buffer on CPU and uploaded as texture to the GPU for blitting and ClearType blending).
Microsoft assume that you won’t change geometry very often and they only need to compute the vertex buffer once and cache it. (Simple rotate, scale, translate is handled by vertex shader matrix math).
But in my case I generate totally new geomtry every single frame (1 frame for 1 tile) and the CPU overhead/stall would be slower than AGG.
Thus what I want to ask is what ideas you guys have on how to do fast OpenGL 2D antialiased vector rendering with decent AA. As I have Direct3D 11 card I would like to use any new/old OpenGL 4.1 feature that you guys think will make it faster.
The geometry side is rather easy to do in OpenGL 4 using geometry shader to expand lines into thin/thick quads. I believe that using the tesselation egine for bezier line/polygon curves would also be fast.
But I am stuck on the antialiasing bit. I figure I will need to use pixel shader for this. I notice that OpenGL 4 has a lot of stuff to let you mess with the fixed function MSAA sampling. Which makes me wonder if I can get it to produce better AA by doing a custom pixel shader sample coverage output.
Has anyone done anything like this before? And where can I find good informatio on how the MSAA pipeline works under OpenGL 4 with the coverage sample output stuff.