how to display chinese in OpenGL

Robert_Osfield · June 8, 2003, 1:54am

Originally posted by henryj:
I don’t think anyone thinks that putting all your glyphs into one texture isn’t a good idea. In fact it’s essential to get anything remotely like good performance.

I’d like to re-interate one of my findings is that one should only download the glyphs one actually requires for rendering.

One of the bottlenecks that was hitting one of the OSG’s users which needed Japanese fonts was that original osgText implement (which was based in FTGL) was trying to load the whole Japanese font file at once, freetype itself just ground to a halt trying to create all the glyphs - over 20,000 of them, let alone having to create texture space for them all.

The new osgText implement just loads what glyphs are needed, and then adds new ones on demand. This keeps the texture usage down, which in turns keeps the load and rendering peformance up.

It does make for a more complex back end to the text implemention, though.

I have to agree with John though that ‘string’ caching is the fastest for the reasons stated as long as you don’t blow your texture budget.

For very low demands on the number of text labels, so that only a single texture is ever required, and the text doesn’t change, then and only then might the rendering whole strings to a texture compete with the quad approach I’ve outlined.

Under these conditions state changes are equal between the two approaches, i.e. none, and we’re left with more quads vs more fill for which will be faster. Now we’re only taking about hundreds of quads here, but most likely millions of pixels to fill, so which one is most likely to win…

Robert.

imported_jwatte · June 8, 2003, 8:44am

A 16x16 res glyph with lumuniance and alpha takes 16x16x2 = 512 bytes.

A single quad takes 4x12 (for the coords) + 4*8 for the tex coords = 80 bytes.

The difference is that the texture is uploaded ONCE, but the vertex has to be re-sent each frame. From a fill rate perspective, the approaches should be equivalent. From a main RAM bandwidth perspective, the pre-rendered text clearly wins unless you change the text more often than once every 10 frames or so. If you change the text that often, the user is unlikely to actually be able to read it.

But, as I said, your system works for you; good for you! We found a system that runs faster for us; good for us!

henryj · June 8, 2003, 1:43pm

One of the bottlenecks that was hitting one of the OSG’s users which needed Japanese fonts was that original osgText implement (which was based in FTGL) was trying to load the whole Japanese font file at once, freetype itself just ground to a halt trying to create all the glyphs - over 20,000 of them, let alone having to create texture space for them all.

The new osgText implement just loads what glyphs are needed, and then adds new ones on demand. This keeps the texture usage down, which in turns keeps the load and rendering peformance up.

I wish you had mentioned this to me. All glyphs except textures had this functionality from the start and this was fixed for textures a long time ago… 1.3b3 November 13 2001
In fact v2.0 has removed ‘pre caching glyphs’ altogether.

Robert_Osfield · June 9, 2003, 12:21am

Originally posted by jwatte:

The difference is that the texture is uploaded ONCE, but the vertex has to be re-sent each frame.
[\QUOTE]

Have you ever heard of display lists???

[QUOTE]From a fill rate perspective, the approaches should be equivalent. [QUOTE]

I’m really impressed that you can’t see the wood from the trees.

When you render the whole string the size of the quad has to be the total width of the glyphs together* maximum height of any one of the glyphs. The maximu height is what gets you, you end with lots of white space to fill with all those small letters.

When you render glyphs individually the quads change size from glyph to glyph, you end you render much less white space. Fill is quite clearly less.

[QUOTE][b]
From a main RAM bandwidth perspective, the pre-rendered text clearly wins unless you change the text more often than once every 10 frames or so. [\B][\QUOTE]

You couragously deluded. Imagine rendering this post. How many glyphs would you need? Compare this to how much texture size you’d need if you just use one image? Please do the sums that I presented in a previous post, the memory footprint is much much smaller when rendering individual quads.

[QUOTE][b]
If you change the text that often, the user is unlikely to actually be able to read it.
[\B][\QUOTE]

The real strength of rendering indivudual glyphs, comes in scalability. There are users of the OSG that require 10 of thosands of unqiue labels, such as depths along a line, each label is unique.

If you use a seperate piece of texture for each label and each one is 32 pixels high, 128 wide, how much space will you need? Now you’ll need 40 textures of 1024x1024 to pack them all in. That’s one heck of texure usage, and if you’re lucky and the user doesn’t have any other requirements for your video cards memory then you might just get away without any thrashing. But lets face in the real world we have other things in our scene…

Contrast this with the need of individual glyphs, subloaded on demand. Numbers, well there only then of them. Add a comma and a decimal place and we have 12 glyphs in total. 32x32x12 = 8192 pixels. This is hefty saving in texture memory. And please remeber that the quads coords are only going to take up a meg for all this geometry, nothing compared to 80Meg you’ll need above.

[QUOTE]
But, as I said, your system works for you; good for you! We found a system that runs faster for us; good for us!

You say it runs faster for you, but have you really tested the alternatives? Were the alternatives properly implemented?

It really doesn’t take much to do the sums properly and realise that the individual glyph approach is going to much more efficient once you really start pushing your text needs there really isn’t any comparison.

I don’t have a problem with rendering whole text strings to textures. If you texts needs are modest, then it is probably the one of easier implementation to go for whilest still retaining high image quality. For a small number of text labels
peformance will also be more than adequate, in fact performance delta with the most efficient methods will be very small.

But PLEASE, don’t tell me or anyone else its most efficient implementation once you start scaling things up, because its just plain mis-informed clap trap.

Robert.

imported_jwatte · June 9, 2003, 3:31pm

I think you should use better language, as I don’t see any reason why you should insult me like that.

To sum up: I believe that display lists live in AGP memory, and thus compete with CPU memory and vertex transfer. You believe display lists live in local video memory and compete with texture fetch and fill rate.

Robert_Osfield · June 10, 2003, 12:30am

Originally posted by jwatte:
I think you should use better language, as I don’t see any reason why you should insult me like that.

I appologies if you feel insulted. It is my belief that the some of the advice you have provided has been misleading and unsubstantiated. Its difficult to put that politely or a cuddly form that will give you a warm glow.

Your suggestion of using rendering whole string with a single quad is good advice, given certain caveats - its works well for a modest number (10’s to hundres) of text labels, its works well for static text but not for highly dynamic text, it works well when font resolution is modest.

The problems comes not because you hanv’t enumerated the caverates, but when you try and punt the idea the rendering whole strings is optimimal for peformance and scales well. Even greater problems come when you try to claim that it peforms better and scales better than the alternative of seperate glyphs.

Please, remember I did’t invent the sperate glyphs approach. When I implemented the latest version osgText I looked at the a whole set of alternatives, including the whole string approach, I did the sums and the seperate glyphs approach was the clearly the best for the peformance and scalability. The figures in above posts explain why.

To sum up: I believe that display lists live in AGP memory, and thus compete with CPU memory and vertex transfer. You believe display lists live in local video memory and compete with texture fetch and fill rate.

Actually this all depends on which machine, graphics card, and OGL drivers you’re using, you just can’t make genalizations - some machines don’t even have AGP However, one thing I can say is the one of key points of display list is that they are designed to be downloaded to memory local to the graphics card, so with a good OGL driver this is exactly what it will do.

Robert

Stephen_H · June 16, 2003, 5:35pm

Sorry to resurrect an old thread, but I was interested because I’m currently designing a full-fledged GUI in OpenGL and I do lots of text rendering.

Things are getting near completion now, and I’ve started profiling the GUI with Numega. Currently, I’m using immediate mode to render a textured quad per character. I’ve noticed that around 55% of the time in the GUI is spent rendering text and I’ve been thinking about ways to optimize this. Of the time rendering text, I’ve noticed a very large portion is used by glTexCoord2f() and glVertex2f().

Obviously, the function call overhead is killing performance. I was thinking about moving to vertex arrays of some kind, but I don’t know if thats a good idea given that most of the strings I render average around 10 to 30 quads, and I’m not sure if this is enough to overcome the overhead of using vertex arrays. Some of the listboxes and textviews are larger though and I think they might benefit from this.

I was wondering what other people had tried, or if anyone has any opinions?

Best Regards,
Stephen

Edit - yes, I’m thinking about moving to cached strings for the small text widgets, but I was curious if anyone had encountered this problem before with single quads…

[This message has been edited by Stephen_H (edited 06-16-2003).]

OneSadCookie · June 16, 2003, 6:10pm

You generally don’t need very large arrays before glDrawElements outperforms immediate mode.

Rendering the whole string to a texture will, of course, get you around this problem. This is what all word processors and text editors have effectively done for years, so there’s no reason it should suddenly become too slow now. Just remember only to redraw the parts that have changed.

Robert_Osfield · June 16, 2003, 11:56pm

Of the time rendering text, I’ve noticed a very large portion is used by glTexCoord2f() and glVertex2f().

The quick coding option would be to display list them, and then profle the results.

My own implementation uses vertex arrays + glDrawArrays as they fit very naturally with rendering a list of quads and its certainly fast enough - even with a million characters I still gettting v. good frame rate.

Robert.

Stephen_H · June 17, 2003, 8:47am

Thanks for the replies!

Sort of unrelated question… maybe this should go in a separate thread:

I know VAR and compiled arrays use post T&L vertex caches when using the glDrawElements() and glDrawRangeElements() functions.

I would guess that display lists also use the vertex cache? And VBOs would use this cache in all states, or just in certain states?

Stephen

Pop_N_Fresh1 · June 17, 2003, 11:28am

The vertex cache will only be used if your using DrawElement type calls. ie: you’re using indexed vertices. When using indexed vertices display lists don’t give any speed gains in my testing.

DrawElements and LockArrays were about 4 times faster than anything I could get out of display lists.

[This message has been edited by Pop N Fresh (edited 06-17-2003).]