I’m trying to wrap my brain around what is being suggested, but I’m not quite certain I get it.
It seems like what you’re asking is to essentially provide a per-instance offset into the index array. That each instance is rendered with a different index count and base index.
One problem with this is that the index counts and base indices are all in CPU memory. I don’t see that as being particularly good in terms of helping performance. Maybe since GL 3.x hardware has those new BaseInstance draw functions, the driver can handle the instance increment manually. But, much like the glMultiDraw functions, that’s something you could do yourself.
Other examples as well are abound too.
OK, what are they?
Considering that one often draws LOTS of letters, this is not good.
“LOTS” is a relative term. Let’s look at the worst case scenario. A 12pt font, rendered at 1920x1200. Full screen, and all the text flushes fully left and right.
You might get 50 words per line. Given an average word length of perhaps 5 letters per word, that’s 250 letters. 1200 pixels of height might give you 75 lines. Total letter count on the screen: 18,750.
Assuming that one uses reasonable compression on the data (2D positions as shorts, 2D texcoords as shorts), each vertex will be 8 bytes in size. Let’s also assume the worst case scenario: one is using the core profile so no GL_QUADS (and no cheating with geometry shaders), each quad takes up 6 vertices. Therefore, each quad will take 48 bytes.
Total byte size for the worst-case screen text count: 900,000 bytes per frame. Less than 1MB.
If you change the entire screen’s text all at once, constantly, every frame, at 60fps, that will require ~52MB per second of transfer.
In any case, I think most video cards can handle ~52MB per frame. Also, if you’re drawing that much text, then it’s likely that your application consists primarily of text drawing (for example, OpenGL-accelerating a web browser). So even if this pushed your graphics card to its limit, it’s not like you’re rendering a bunch of other complex stuff too.
So I’m not seeing a use case for this.
Yes, your method (if directly supported by hardware, which I highly doubt for modern hardware) would be faster and take up less memory. But as we see here, even in the worst case, you’re not even taking up a full MB of memory. And simply I don’t see drawing 18,750 quads as being particularly difficult for any GPU, even low-end ones.
Do you have a use case for this that doesn’t make something that isn’t a bottleneck faster?