Efficient bitmap text writing using shaders, is it possible?

I’m experienced on the C++ side but a complete novice in GLSL.
Given a texture on the GPU, a vbo with a vector of dx,dy,w,h and a vbo of 8 bit integers, is it possible to do the following (and can anyone show which functions I would have to learn to do this)

  1. start at location (x,y)
  2. for each 8-bit integer i, copy a rectangular region from the texture at location (0,32*i) with size (w,h) to (x,y+dy)
  3. x = x + dx

In other words, the texture and the vector of dx,dy,w,h support displaying a font, and the vbo of 8 bit integers is a string.

For a font size 48, with chinese characters, the size of the texture could be 48 x 96000.
This of course exceeds 4k. Do modern cards support this? Obviously the font could be stored into a rectangular grid but that is more complicated and possibly slower.

Also, could I support an if statement, ie

if i == ‘\n’ {
x = 0 y = y + 32;
}

to move to the next line?

Two issues:

  1. If you want to supply one vertex per glyph, you’ll need to use either a geometry shader or instancing to convert that to a pair of triangles. Both have a performance cost relative to using 4 vertices per glyph.

  2. The x = x + dx part is a bit awkward for shaders. Shader invocations run in parallel so you can’t pass values from a previous invocation to a subsequent invocation. It would be simpler (and possibly more efficient) to calculate the positions on the CPU. If you must do this on the GPU, look up “parallel prefix sum” (which is an O(n*log(n)) algorithm, whereas a sequential prefix sum is O(n)).

Without using either a geometry shader or instancing, you can draw a set of quads (triangle pairs) which take the same integer attribute for each of the four vertices. The font data (i.e. the bounding rectangle of each glyph within the texture) can be stored in a texture, uniform buffer object (UBO) or shader storage buffer object (SSBO). UBOs have a size limit which isn’t required to be more than 16384 bytes, whereas SSBOs are limited only by available memory. A shader can’t perform random-access lookups in a VBO (data in VBOs is per-vertex or per-instance and is passed to vertex shader inputs automatically).

Instancing would allow you to use only one vertex per glyph, avoiding the need to specify each character code four times, but most implementations don’t handle such small instances efficiently.

I wouldn’t expect many (if any) cards to support such a texture. E.g. the one on this system (Radeon HD 7800) reports 16384 for GL_MAX_TEXTURE_SIZE. Using a rectangular grid isn’t going to have a noticeable impact.

Again, this can’t be implemented directly. The divide-and-conquer approach used for a parallel prefix sum could be adapted for this, but it would be more efficient on the CPU. In general terms, CPUs can perform both map and reduce operations in O(n), whereas GPUs perform map in O(n) but reduce in O(n*log(n)).