Pre-Baked Resources

One feature I think will be good to have for opengl to progress is to allow for baking of resources into memory and then load them for later use. Lets say for instance you want to load a png into a texture and use mip mapping and DXT1 compression, which is common. You would have to decode the png then opengl would have to compress it as DXT1 then create mip maps for the texture. This in total would take for instance would take 300 milliseconds for a 1024x1024 texture(rough estimates). What if there was a function which takes the texture as it is in memory and stored it as a memory stream and have it loaded later directly into memory as is with out any pre processing. This would allow it to load in 3 milliseconds using glCompressedTexImage2D for each level, but then you still have to make additional calls to set the state for that texture. What if there was one call that would load a texture with all its mip map levels if mip mapped and loaded all its previous states. This would allow for 10-20 calls to be reduced to a single call. This can also be used to improve loading speeds for geometry.

Here’s an example for baking a resource.

GLuint tex;
glGenTextures(1, &tex);
glBindTexture(GL_TEXTURE_2D, tex);
glTexImage2D(GL_TEXTURE_2D, 0, format, width, height, 0, GL_BGRA, GL_UNSIGNED_BYTE, (GLvoid*)bitmap->bits());
if (generateMipmaps)
    glGenerateMipmap(GL_TEXTURE_2D);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, wrap?GL_WRAP:GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, wrap?GL_WRAP:GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, generateMipmaps?GL_LINEAR_MIPMAP_LINEAR:GL_LINEAR);

GLbyte *bakedTexture;
GLuint size;
glBake(GL_TEXTURE_2D, bakedTexture, &size);

glDeleteTextures(1, &tex);

Here’s an example for loading a baked resource.

GLuint tex;
glGenTextures(1, &tex);
glBindTexture(GL_TEXTURE_2D, tex);

GLbyte *bakedTexture; // has a preloaded baked resource
glTexBaked(GL_TEXTURE_2D, bakedTexture);

The other benefit is that it will encourage development of texture tools to handle the pre-baking of a resource which have multiple benefits to an application. Since some of the work is handled by the driver making these tools will be simpler. To list a few:

  • No need to use libraries like FreeImage in your application because resources are already decoded.
  • Simplifies texture creation for new users.

There are some problems like other people stealing your resources since it’s in an easy to read format but tool developers can through garbage into a baked resource and remove them before they are loaded, this can even be done by the texture tool developers. The tool develops can also use additional compression on a pre-baked resource to reduce space of resources.

This will benefit both high end and low end users. Easy texture creation for low end developers and fast resource loading for high end developers. Making things easier for low end developers will mean less time spent learning the API which can help with motivation to stick with it.

Lets say for instance you want to load a png into a texture and use mip mapping and DXT1 compression, which is common.

This is not in fact common. The common thing to do is take that PNG and shove it through an off-line tool that will generate a DXT1 compressed texture with mipmaps in a .DDS file. Then you load that.

What you’re talking about is common for tech demos and the like. But most serious applications pre-bake their data whenever possible, for many reasons. Not the least of which is that mipmap generation and S3TC compression can be of higher quality since you aren’t relying on OpenGL implementations that were likely written to be fast, not good.

This would allow for 10-20 calls to be reduced to a single call.

Obviously the ARB shouldn’t have bothered with things like PBOs and asynchronous texture loading; no, the real bottleneck in texture uploading is calls to glTexParameter.

Why is there no sarcastic smileyface on this board?

This can also be used to improve loading speeds for geometry.

Oh, I’d love to know how that would be possible. Since the format of geometry is defined by the user and told directly to OpenGL, and can therefore already be stored in binary images suitable for direct transfer to buffer objects.


GLbyte *bakedTexture;
GLuint size;
glBake(GL_TEXTURE_2D, bakedTexture, &size);

OK, so you’re having the OpenGL implementation allocate memory and it passes back that allocated memory pointer to you. There are many problems with this:

1: You would have to pass a GLubyte **. This is C, not C++ where you can just use a reference.

2: The returned pointer is owned by OpenGL. So… when is it deleted? When you call glDeleteTexture? Which you, in your sample code, do immediately?

3: It doesn’t work well with Pixel Buffer Objects. And if you’re going to have this idea, then there’s no reason to make it PBO-unfriendly.

The correct way to do this is to have a query for the size, and then you provide a buffer of that size, which can be a memory pointer or a buffer object + offset. See ARB_get_program_binary.

  • No need to use libraries like FreeImage in your application because resources are already decoded.
  • Simplifies texture creation for new users.

You seem to have developed this idea under the belief that the “baked” format will be stable and cross-platform, defined by the OpenGL specification byte-for-byte.

If that is the case, then what you’re asking is for OpenGL to have a DDS or KTX loader built into it. It will do nothing for overall loading performance, because the format cannot be implementation dependent. If NVIDIA and ATI use different texture swizzling techniques or whatnot, the generic “baked” format would have to pick one side or neither. Someone would get screwed over. And picking one side would mean that the ARB is effectively offering an advantage to one implementation over another.

And God help you if you took this format to a big-endian machine. Now, that implementation has to do byte-swapping on top of everything else.

So no: to have any utility, this would have to be an implementation-dependent format. So you’d have to do it like get_program_binary. With each “baked” texture comes a format value. The user will have to cache the format with the “baked” texture; when attempting to load it, if the format has changed, they have to load the source data.

Or, to put it another way, it would buy you none of the advantages you suggest. It won’t help new users because they’ll still have to load the source format if the “baked” one has changed. It won’t help high-end users because the 12 calls that they don’t have to make aren’t a performance bottleneck. Or even noticeable in any profiling data (outside of the pixel transfer itself).

The ARB does not exist to do your work for you; just write a DDS loader already. Or use one that already exists.

There are some problems like other people stealing your resources since it’s in an easy to read format

Right. Because DDS, KTX, PNG, TGA, and others are such cryptic formats. It’s not like there are entire websites devoted to defining these formats byte-for-byte, as well as entire libraries solely devoted to reading data from them.

OK so not a good idea. Wasn’t sure about the complications with implementing it. PhysX uses pre-baking and it is cross platform so I’m sure there are ways they can get around some limitations.

I personally use offline tools, when I said common I has referring to the usage of DXT1 and mipmapping not so much the PNG part, I’m not a fan of dds loaders libraries since they not well written and incomplete. My point was to make a standardized method which will help newer developers.

Sorry about that small typo.

glBake(GL_TEXTURE_2D, &bakedTexture, &size);

Ok so now that you said there are lots of challenges involved, anyway suggestions you can give to make a work around some of these? Or some good alternative solutions?

PhysX uses pre-baking and it is cross platform

The library that NVIDIA owns and operates is “cross platform?” The one that performs terribly on GPUs that just so happen to be made by NVIDIA’s main competitor?

That’s a very broad definition of “cross platform”.

Ok so now that you said there are lots of challenges involved, anyway suggestions you can give to make a work around some of these? Or some good alternative solutions?

No. There’s nothing to be gained here. There’s nothing wrong with texture loading that needs to be fixed. There is no problem in need of a solution here.

If you aren’t a fan of DDS loaders because you find them to be “not well written and incomplete” then write your own. The format is very available to anyone who wants to write a loader for them.

Honestly, the only thing I would change about texture loading (ignoring the API surrounding textures) would be to add a flag that lets you specify textures in top-left coordinates.

PhysX has had a complete rewrite, they moving more towards CPU than GPU physics processing. They also adding support for mobile platforms. PhysX is also used by UDK which is cross platform. If you are a PhysX developer you would have known this, which you are not so don’t crit a platform you are not developing with.

I’ve got something planned to help fellow developers with texture loading, it’s all HUSH HUSH for now. It’s not that I find dds loaders to be badly written, it’s a fact that they are. They fail to load allot of useful formats making them not viable for 3d game development and that’s my motivation to fix that problem.

PhysX has had a complete rewrite, they moving more towards CPU than GPU physics processing.

Then the ability of PhysX to bake objects, which you have just admitted is more about CPU than GPU processing, would have no bearing on the utility of “baking” textures on a primarily hardware-driven API.

Every library has its own formats. Ogre3D has its own mesh format. That’s not something special or unique. And it certainly doesn’t mean that OpenGL should have a mesh format built into its API.

I’ve got something planned to help fellow developers with texture loading, it’s all HUSH HUSH for now.

What is it with people having secret projects anyway? They act like if people know that they’re working on a texture loading library (which you effectively just admitted to), that someone will come along and steal their thunder.

Guess what? I’m writing a texture loader too (necessary to continue my tutorials). So is Groovounet, though that project may be defunct.

It’s not that I find dds loaders to be badly written, it’s a fact that they are. They fail to load allot of useful formats making them not viable for 3d game development and that’s my motivation to fix that problem.

What “useful formats” do they not load? Having looked through the DDS10 format, I can’t find a single one. And which DDS loaders are you talking about?

Oh, and DDS’s are widely used by 3D game developers. I’m afraid I’ll have to take their opinions over yours.

For streaming textures in the background, through a second thread, and most (all common) texture formats:
Create a kinda big buffer-object, enough to fit all miplevels of one or several textures. Bind as PBO. Map it for writing. In a second thread, ReadFile/fread to the mapped memory. When second thread is ready, and first thread is about to start calculating game-logic, do the glTexImage2D and Paramterf calls, with offsets to the buffer. Delete buffer.

The hardware usually has circuitry (or the driver has fast codepaths to setup a special “shader”) that does this “baking”, provided that the tex-format is supported (commonly-used).
Still, for some drivers it might be faster to not map/remap new buffers again and again (the driver+firmware have to create virtual-memory maps on both cpu and gpu side); so reuse of mappings or just simple glTexImage2D(…, cpuPointer) can be slightly faster on those.

Also, like Alfonse wrote, only the specific drivers+hw know how to bake best; detect your resource-usage patterns and choose in what way to bake or re-bake on runtime.

In all cases, you’re properly covered. And for the last few percent of performance, you have to do some profiling on target devices, and ignore non-huge differences in performance (as those can change with new drivers for same hw).

DDS is great/preferable, as it generally encapsulates in a nice way all those “common” formats I mentioned before, that hw can easily bake. And almost none of the formats, that need some pre-conversion on cpu-side.

TL;DR: no baking API required, imho.