What was wrong with the old API?

Clipping is supported in the core profile. Only gl_ClipVertex has been removed from core, but gl_ClipDistance has not.

When it comes to clipping not being present, I was talking about OpenGL ES2 (not OpenGL desktop) and as an example of when the doing an operation via programability (for clipping via discard) vs having FF.

kRogue: How is something that may be IHV specific, like separate conversion hardware, relevant to the core API? I thought we weren’t talking about the low-level benefits of special-purpose hardware goodness, but how hardware features are exposed on the application level.

kRogue: How is something that may be IHV specific, like separate conversion hardware, relevant to the core API? I thought we weren’t talking about the low-level benefits of special-purpose hardware goodness, but how hardware features are exposed on the application level.

Keep in mind most of my bile is directed firmly at OpenGL ES2. With that disclaimer in mind lets look at one epic fail train that is in OpenGL ES2 and NOT OpenGL: glTexImage family.

Under OpenGL ES2, essentially, the GL implementation is not supposed to do any format conversion for you. Thus if you made your texture GL_RGB565, then you need to feed in GL_RGB565 for it, i.e. the GL implementation is not supposed to convert it for you. This is stupid. Firstly, many GL implementations do not store their textures in scan line by scan line, but rather twiddled, so the implementation needs to do that anyways. Secondly, it is a royal pain in the rear, to write some of the conversion code optimized for each freaking platform. Worse it is pointless, since like 99% time the bits need to be twiddled anyways. Some hardware has special bits to do that conversion so if the freaking API said it would do the conversion that hardware would be used by the GL implementation. Instead, we all write for the lowest that we can use, so 99% time the conversion is done by an application via CPU… and since lots of SoC’s do not have NEON, not even using NEON… epic fail. Where as if the specification said it would do the conversion for you, then that implementation could use specialized hardrware, or special CPU instructions, etc… instead, a very fixed function kind of thing that is common is not done by the GL implementation because the GLES2 spec is retarded. For what it is worth there are some extensions that will let you store image data as YUV and when used in a shader you do get RGB.

Adding to my bile: using half floats under GLES2 just plain sucks. You need to do the conversion yourself, and it is a bit of a crap shoot guessing the endian-order at times. I’ve seen the exact same code seg-fault in glTexSubImage2D on half floats (and floats) in one GLES2 implementation but in another work just fine… both platform had the same endianness and yes both listed the float and half_float texture extensions…

What I am saying is that very common bits done very often for gfx and often the same way, should be (mostly) FF: clipping, image conversion, some primitive types beyond triangles, lines and point sprites, alpha test, texture filter and gather, LOD computation.

Where as if the specification said it would do the conversion for you, then that implementation could use specialized hardrware, or special CPU instructions, etc… instead, a very fixed function kind of thing that is common is not done by the GL implementation because the GLES2 spec is retarded.

Alternatively, it could be broken. And then what do you do? If the implementation’s conversion for some particular color ordering you’re using doesn’t work, you have to implement it yourself.

The more stuff you force into implementations, the greater chance they will be broken. And you’ve already pointed out that the handling of floats/halfs in some implementations is suspect. And that’s just straight copy uploading; adding conversion on top of that is just begging for trouble.

Desktop GL developers had to expend effort to stop getting conversions to happen. We had to come up with lots of ad-hoc rules just to figure out what the proper ordering would be to prevent conversion (beyond swizzling, which is just a fancy memcopy). This is necessary for getting maximum texture upload performance. If the implementation simply gave an error when a conversion would have happened, it would be a lot easier for all involved.

Conversion is something somebody could stick in a library and just give it away. It is far from essential.

I think you guys are confused. This site is for OpenGL. If you want to make suggestions for GL ES, then khronos.org is what you want.

We are talking about old GL and the advantages/disadvantages of new GL in this thread.

Thing is though that there are some ES features that it would be nice to see going into full GL, and explicit matching of format with internalformat is one of them. This has bearing on one major thing that was wrong with old OpenGL - because of it’s heritage as “a software interface to graphics hardware … that may be accelerated” it’s the case that there are various software layers that any given OpenGL call must go through, and one of those may be a conversion layer. That in itself is OK; what’s not OK is that you as the programmer have no way of knowing when it happens, aside from lots of profiling and educated guesswork after the fact.

So true, I just want to make sure that the idiocy of GLES2’s texture image specification does not infect OpenGL.

Shudders. In OpenGL you can set the precise internal format, you cannot do that with OpenGL ES2. Rather it is determined implicitly. If you want to make sure that there is no image space conversion you can do that in OpenGL: make sure the internal format you specify is exactly the same as what you feed GL. Now if one uses the formats like GL_RGB in dekstop GL for internal format, then one is asking for trouble, as then the GL implementation chooses the internal format.

Um, if an implementations color conversion is broken it is a bug in the GL implementation! Mind you, in desktop GL we all like 99.999% time assume the GL implementation has the color conversion correct when we don’t make the internal and external formats match. We do this all the time with half floating point textures for example.

I just want to make sure that the idiocy of GLES2’s no-image-format-conversion does not invade desktop GL.

If we have to talk about GLES then I have a few words about this format conversion or no format conversion debate.

The reason why this was left out of the GLES specification is because it involves a lot of GL implementation code and the possibility of using suboptimal inputs, as it was already mentioned.

Please keep it in mind that embedded hardware had limited resources compared to desktop and even though embedded hardware evolved a lot this is still the case. Personally, I am happy to see that they don’t waste memory and run-time for larger GL implementations on embedded hardware, also I’m happy to see that they don’t allocate a potentially big temporary buffer just to make the format conversion. Even a few megabytes is really a premium on many embedded devices so I think format conversions are something that have to be done at build time not at run time. This is actually true for desktop as well if you do something serious.

Thing is though that there are some ES features that it would be nice to see going into full GL, and explicit matching of format with internalformat is one of them.

Given that I work on embedded and GLES2 all freaking day, let me share some bits. Firstly, because most GLES2 implementations store pow2 texture data twiddled, that means the tex image calls all need to do a conversion in terms of pixel ordering. The image conversion could be done at the same time as the repacking of pixel order. As far as needing a large buffer to do the conversions, that is not true at all either, as the color conversion can be done in place on the location in RAM where the texture data resides. Lastly, regardless, someone needs to do that conversion. So we find that every freaking application/fraemwork needs to provide the image conversion routines… i.e. write the same freaknig thingover and over agian. If a framework does the conversion for you that WILL take up more memory to store the converted data before giving it to the GL implementation which will in turn copy and twiddle it internally. When the GL implementation does the color conversion, it can perform the color conversion on the location where it stores the actual texture data INPLACE. Lastly, GL implementations are usually tightly tuned to the SoC, thus not only will a GL implementation providing the color conversion use less memory than a framework providing it would, but 99.99% time it will run faster using less CPU. And oh yes, less work for most developers is always a good thing.

Secondly, if you really want to know where GLES2 implementation eat memory, you need to look at the tiled renderers… for these buggers each GL context can take several megabytes easily (I’ve seen as high as 32MB) NOT including the framebuffer, textures or buffer objects. That memory cost comes from the tile buffer allocated for each GL context.

Just two questions about something that I still don’t understand:

  1. Why don’t you do the image format conversion off-line, during build? In most cases this can be done.
  2. How do you do in-place conversion between formats of different size?

the first answer is kind of simple: when deciding what format for the gizmo to use you need to check at run time what extensions and how much memory the gizmo has. Additionally, some data you do not have until the app is running (for example procedurally benerated data and/or data fetched externally).

in place conversion is not rocket science, you just make for each pair (accpeted input format, internal format) a function that does the job:


//convert from some input typ stored at inpixies to storage at outpixels
typedef void (*store_and_convert)(const void *inpixies, int bpp, int bperline,
                                                  void *outpixels);

store_and_convert funcs[number_input_types_supported][number_output_types_supportrd];

and then the teximage call becimes just dererence fubction table and call the function…

In OpenGL you can set the precise internal format, you cannot do that with OpenGL ES2.

That is really the problem with ES. Not that there’s no conversion, but that it doesn’t allow you to pick a specific internal format. So the format matching is done by convention, rather than by explicit request (and thus making it possible to mis-match).

Mind you, in desktop GL we all like 99.999% time assume the GL implementation has the color conversion correct when we don’t make the internal and external formats match. We do this all the time with half floating point textures for example.

We do? In general, you wouldn’t use a half-float internal format if you weren’t passing half-float data. Sure, you could, but no application that actually cared about performance ever would.

The image conversion could be done at the same time as the repacking of pixel order.

Assuming of course that the repacking is not a special DMA mode. Because if it is, you can’t really do in-place conversion.

Um, if an implementations color conversion is broken it is a bug in the GL implementation!

Yes, but opening up more avenues for bugs isn’t exactly helping the “buggy OpenGL drivers” situation.

There are two sides of an argument going on here. Performance and flexibility. OpenGL is traditionally on the flexibility side; you provide the data, specify it’s format and layout, tell GL what internal representation you’d like, and let the driver work the rest out.

That’s cool for some use cases but it’s not cool for others.

Sometimes you want the data to go up FAST. You’re going to be using that texture later in the current frame, or in the next frame at the absolute latest, you’re doing an upload, and you need to know that the format and layout you’re using is going to exactly match the internal representation OpenGL is going to give you. Passing through any kind of software conversion layer - whether your own or in the driver - just doesn’t cut it.

So how do you find that out? Because right now it’s trial and error; you have no way of knowing. Even worse, it can vary from implementation to implementation, so it’s not something you can do a bunch of tests on and then code the results into your program.

At the very least that opens a requirement for a “gimme gimme exact format baby” extension, but by now it’s more appropriate in the Suggestions forum rather than here.

For the case of GL ES, I really do not see what the big deal is.

Isn’t there a format that is supported by all the GPUs? On the desktop, they all support BGRA 8888.
For the embeded world, there should be a common supported format, whether it is R5G6B5 or BGRA 8888 or BGR 888 and so on. Even if there is a GPU that doesn’t support the “common” format in question, surely it supports something.

I can imagine that it can be a huge problem if there were 100 different GPUs and each supports its own unique format and they are forcing you to write that format converter.

So, how big is the problem?

Yes, you are right, R5G6B5 and BGRA8 is generally supported by 99.99% of embedded devices.

Things can become more complicated only in case of compressed texture formats (e.g. PVRTC or ETC) but hey, you store the compressed images anyway separately, I mean you don’t do on-the-fly texture compression on embedded hardware, do you? So what’s the problem? You can solve the selection by using a single conditional in the client side code and there is really no need for format conversion.

Sometimes you want the data to go up FAST. You’re going to be using that texture later in the current frame, or in the next frame at the absolute latest, you’re doing an upload, and you need to know that the format and layout you’re using is going to exactly match the internal representation OpenGL is going to give you. Passing through any kind of software conversion layer - whether your own or in the driver - just doesn’t cut it.

So how do you find that out? Because right now it’s trial and error; you have no way of knowing. Even worse, it can vary from implementation to implementation, so it’s not something you can do a bunch of tests on and then code the results into your program.

Lets talk conversions ok. Lots of us are familiar with the RGBA vs BGRA vs ABGR fiascos for conversions. In the embedded world it is even richer and much more hilarious. The formats exposed by GLES2 for textures are essentially RGB565, RGBA4444, RGBA5551, RGBA8, RGB8, L8, A8, LA8 … and that is mostly it. Mind you, plenty of platforms do not really support RGB8, they pad it to RGBA8. Also, in the embedded world, like 99.999999% time, it is a unified memory model. It gets richer, there are SoC’s that let you write directly to a location in memory from which GL will source texture data. There are some pretty anal rules about the formats, but you can see that getting an actual location to directly write bytes of image data is orders of magnitude better than the idiot guessing game of will it convert or not. If you need to stream texture data on an embedded platform, using glTexSubImage2D is never going to have a happy ending, you need something better, you need a real memory location to directly write the image data.

With that in mind, and with in mind that GL implementations will more often than not twiddle the texture data, the color conversion issue is not that big a deal on top of everything else, really it is not. Besides, I would wager that the folks making the GL implementation will do a much better job than me or for that matter most developers, simlpy because that is core work for GL implementations, twiddling bits to make the hardware happy.

What is quite troubling, especially when you think about it, is that color conversions are done all the freaking time every time you draw one frame of data (texture look up [internal texture format to format used in shader arithmetic] and then writes to framebuffer)… and yet no one cares… looking in particular in the context of texture look ups, there really is no excuse for color conversion to not be in a GL implementation.