Nvidia GL_UNSIGNED_INT_10_10_10_2 endianess

Jwatte I agree that he shouldn’t get different result on two code paths, I said exactly this, I hope you don’t think I disagree on this? The key question is which code path is broken, as I said in my post? I was frankly surprised to see in the manual that unpacking was explixitly only at the component level so I was cautious in my remarks and even hinted that the spec may differ.

Now it looks like the manual disagrees with the spec since Evan kindly looked for us. And the swizzle should happen with the packed type before the components are separated.

Am I reading you correctly Evan? Is it clear that this means the packed type is swizzled not just the components?

This means that there are two bugs, one on the hardware path and one in the manual page. Or at the very least the manual page should be a little clearer.

I dislike some of these discussions about explicit endianness because people get confused over stuff that should be a non issue and start attributing reasons to the wrong stuff or making claims of explicit endianness for something or other. Most of the confusion over endian issues arises from this kind of loaded misleading discussion and I never seem to have trouble with it personally. I’ve even written anonymous endian handling code that doesn’t care what the file endian is, it just knows if it is opposite or equal to the current system and handles it. It doesn’t even know what the native endianness is. The only time you’d ever care would be a wreckless cast from int to byte for example.

0x000000FF is always your low order byte reguardless of the endianness of your system. Only *((char *)foo + 3) is system dependent.

For me it seems clear that a 10_10_10_2 format should produce the two bits of alpha in the 2 LSB of the format. Just as 0x00000003 masks the low order bits. There is no endianness dependency to this, it is clear and unambiguous. It is NOT system dependent and it says nothing about endianness. The location of the LSB in a cast byte stream is system dependent, but good code should either not care or at least be absolutely clear on the reason for casting and the effects.

The Cineon issue was caused by a loader that created an erroneous in memory representation because it loaded a file written as packed binary on an opposite endian system without correction. Hopefully byte swizzling in OpenGL will fix that, we’ll see. The software swizzle I posted will work around in the meantime.

On the other stuff, I disagree. But you defined your own file format there. The guy said Cienon was 10_10_10_2, not the reverse. I also strongly disagree with your comments “on a little endian machine do this, and on a big endian do that.”. NO, it depends if the bytes are swizzled because of a problematic file read. If you natively create 2_10_10_10 with MSB alpha etc, it works on either system (using your example).

jwatte, sorry, you seem to have a fundamental misunderstanding with MSB & endianness. The MSB is ALWAYS in the correct place. The MSB on a big endian vs a little endian system is in a DIFFERENT PLACE, there’s no need to swizzle if the MSB data is in the MSB location. They are DIFFERENT bytes. That only matters for i/o.

If you read a file as packed ints that has been written as packed ints on another system you need to swizzle depending on whether that OTHER SYSTEM matched the endianness of THIS SYSTEM. It has nothing to do with the native endianness but whether the unadulterated binary byte order written to the file on the other system and read to this system is the correct native order. If it is not then the bytes wind up out of order for a native (in this case 10_10_10_2) representation, because they are in the byte order for an opposite endian system. This is exactly what happened with the Cineon file trasnsfer from SGI and is what happens with all other endian related binary file reads.

If the in memory representation is correct on either a big or little endian system you don’t swizzle.

[This message has been edited by dorbie (edited 04-01-2003).]

I decided to check the spec for myself, and it is ambiguous (or more correctly broken).

In section 3.6 the spec clearly talks about data swizzling of “elements” in table 3.7. and accompanying text. Table 3.6 immediately preceeding it refers to “Element meaning and order” in the second column, and the text refers to the number of elements in a group. The implied meaning of “element” is an individual component when it should be a GL Data Type array element.

I think the ambiguity arises because you have a single “GL Data Type” representing multiple “Elements” in table 3.6 because these ‘special interpretation’ packed types weren’t around when an “Element” = “GL Data Type” = “Component”, all singular. The different behaviour in two code paths in the same driver is a real world example of the consequences of this ambiguity.

I think the behaviour should obviously be to swizzle at the packed “GL Data Type” - array elements level, not at the component level. It needs some clarification though, and of course the manual should be updated to reflect this.

To make this clear, life was simpler when parts of the spec were written and we had:

       /--> GL Data Type (element) ----> Component
      /

Pixel Data —> GL Data Type (element) ----> Component

–> GL Data Type (element) ----> Component

Now with packed formats we also have:

                                   /---> Component
                                  /

Pixel Data —> GL Data Type (element) ----> Component

—> Component

The heart of the problem is table 3.6 (I’ve decided :slight_smile: RGBA packed formats really have one ‘Element’ per pixel and four components, and unpacked ‘traditional’ formats have multiple elements so that table can no longer be created in it’s present form for format names. The manual is particularly misleading because of similar assumptions but makes really emphatic statements that are wrong (or will be when table 3.6 is fixed).

The spec needs to avoid associating element counts and separate components with particular format tokens.

[This message has been edited by dorbie (edited 04-02-2003).]

> jwatte, sorry, you seem to have a
> fundamental misunderstanding with MSB &
> endianness.

I suppose I’ll just go and re-implement those drivers and the linker/compiler I worked on for both x86 and PPC platforms, then. No, wait, they’ve been working fine for eight years, they didn’t suddenly break because of an Internet post! :wink:

When you find something at all in what I said that’s WRONG (as opposed to different from your own opinion on how best do it) then please let me know.

Meanwhile, let me justify why I prefer to do it the way I recommended (which I claim is correct):

I prefer to read the file into memory as-is, and then tell the driver to deal with the data as it arrives from the file. Memory mapping files wouldn’t work at all if I did it your way (which means having the program touch the data before having OpenGL touch it). To me, it seems clearly superior to offload it all to the driver, because it’s likely to either do the same job I’d be doing, OR do a better job, so it’s either a wash, or a net win.

Btw: when you say “swizzle” it’s somewhat un-clear, as the spec doesn’t use that word. It uses the word “reverse” when it comes to component order, and the word “swap” when it comes to byte ordering of elements larger than a byte.

Now, that’s not a correctness issue, just a preference issue. Your way is good too, except for these cases where it isn’t.

I thought that clearly defining what the expectations for a file format might be, and then show how it resolves through reading the spec, would be a good way of illustrating how it works. The original post did not actually specify what the exact file conventions were, so I couldn’t use those for the illustrative example, or I would have.

OK, looking back at your post I’ve tried to boil it down to the most objectionable part, and why we disagreed, in doing this I’ve realized we agree with each other and you’re not wrong. In the definition before your last paragraph the second line you define your format as “- the two highest bits are A … lowest bits are B”. The reason I objected was that this does not define in any way the endianness of the data and whether UNPACK_SWAP should be true. The location of the “highest” bits is different on a big and little endian system and in files written on big and little endian systems. If you’d said the first and last bits it’d be defined (for a bigendian format), this is an important distinction because it is THE definition of endianness.

I thought you were making the assumption that big endian is somehow preferred in a file or memory representation.

Sorry, but I’ve just noticed that after saying all this in item 3 you state that the format is stored as little endian in memory. OK so you’re not wrong. You do get it and we do both violently agree.

I find it slightly objectionable that anyone would define high & low bits in a format then say it’s little endian in memory potentially changing the programmatic location of these on any system until after a swizzle, but you are correct in what you wrote.

[This message has been edited by dorbie (edited 04-02-2003).]

Table 3.6 in the spec still needs to be reworked. It’s broken for packed types. The manual page is also wrong in this for packed types.

Thanks for all the discussion…

Btw,

GL_UNSIGNED_CHAR +
GL_RGBA, GL_RGB and GL_ABGR_EXT all
work without any endian issue across
both x86 and Irix. Ie the same raw
data file will appear identical across platforms.
The Component is UNSIGNED_CHAR, no endian issue there
(and definately not an argument that openGL
is bigendian.)

With 10_10_10_2 packing it is represented as GL_UNSIGNED_INT and so is justifiably
tied to endianess of an INT.

So I agree the software driver on nvidia is
correct, the hardware driver just appears
to be ignoring GL_PACK_SWAP_BYTES or is
unaware of how to properly handle 1010102
with this regard.

Did you try the workaround I posted? What kind of performance do you get with that?

Nope already had something similar, but I
imagine it might be useful if translated
into a register combiner or fragment shader.

Anyway, the point is
“life is too short, dma everything”

My point would be, where is the image from and at what data rate? You can still DMA after the swizzle and get on with the next frame. I doubt this workaround would be the performance limiting factor under the right circumstances.