max value with fp16 blending? :(

hi,

i have some problems with fp16 blending. i use GL_ONE, GL_ONE blending, depthtesting is disabled, fragment shader does just this:

gl_FragColor.r = 1.0;

after rendering to FBO, i get bad results. when overdraw is less then 2048, i get correct results, but when overdraw is higher, then i get the value of only 2048! (for debug purposes i just readback all pixels from FBO: glGetTexImage(GL_TEXTURE_2D, 0, GL_RED, GL_FLOAT, buffer) :wink:

i use GL_RGBA_FLOAT16_ATI as render target. gf6600gt, win xp sp2, 81.98 and i render just lines.

thanks

I think 2048 seems about right for a 16-bit float. It is a the max of 11 bits. Other 5 bits will be for the sign and exponent.

To explain a bit more, in floating point math when you add two numbers together, if one is big enough so it completely overshadows the other number it acts like the little number was ignored.
(I know this is a little rough of a description - google for floating point math)

Looks like you are going to have to switch to another FBO after 2048 passes or use 32-bit floats (probably no hardware blending support however)

i know exactly what you mean. but when i read and write to the same texture adding 1.0, or 0.5 i get much higher valuesā€¦

and i have antoher problem, sometimes the max value in texture is 2x the real maximum value. how can happen this?

shelll,

Youā€™ll have to explain further. It sounds to me like (as sqrt[-1] said), youā€™re just seeing the result of exponent-dependent ā€œepsilonā€ in floating point numbers. Youā€™d hit a different maximum if you started from zero and added 0.25 each time, for exampleā€¦

Thanks -
Cass

i bind the same texture, as i am rendering to (i know, undefined behaviourā€¦). then in fragment shader

float temp = texture2D(tex, vec2(gl_FragCoord.s / 2048.0, gl_FragCoord.t / 2048.0)).r;
gl_FragColor.r = temp + 1.0;

so i read value from that texture, increment it and write it back. with this method i get values higher then 2048ā€¦

OK, but that doesnā€™t really affect what Cass just explained and as sqrt hinted at. Specifically, if the exponent is large enough the 1.0 becomes insignificant, or unrepresentable because the mantissa is (virtually) shifted left by the exponent bits and you just run out of significant bits in the mantissa to cover the bits where something in the range of 1.0 would be added. Basically X+1 = X for a large exponent, i.e. when X becomes large enough. As Cass further explained, if you change the value of the increment then the exponent for which your mantissa cannot cover the incrementing value changes. i.e. if you add 2.0 youā€™ll see cutoff higher than you do now, if you add 0.5 the cutoff will happen sooner. Itā€™s just a quirk of floating point arithmetic, it applies equally well to float32 arithmetic on the CPU.

At some point there is a real maxfloat value but itā€™s sure as heck not 2k. Think of 2^(exponent bits) and thatā€™s the kind of unused LSBs youā€™d have in a fixed point scheme. i.e you have a crap load of rangeā€¦ you just donā€™t have the same precision over all of that range.

i know why it happens, i was just confused why it did not happen with ā€œblending shaderā€. and that i realized right now, shaders arithmetic is 32-bitā€¦

i tried some different numbers. best are higher then 0.25, with all of them i get 2048 steps. is it impossible to get more then 2048 steps with fp16 blending? it is too low for me. i need at least up to 200k stepsā€¦

with blending shader, there are problems :frowning:

Well thatā€™s a bit odd.

Thinking more carefully with 5 bits of exponent thereā€™s 32 shift at least and probably +/- 16. So youā€™ve got 1.0 -> 1.999ā€¦ (to the limit of mantissa precision) *2^16, now thatā€™s an assumption. I donā€™t know what the base shift for HALF is they may have erred on the side of fractional values but that wouldnā€™t make sense.

It could be an fp bug, the exponent offset (nominal exponent for 1.0) could err towards precision rather than range (possible due to human contrast sensitivity concerns), there could be some non float jiggery pokery going on internally, or maybe weā€™re just missing something, you could try adding a significantly larger number 2x should do it 4 x certainly should but just for fun try incrementing by 16 for example just to make sure.

Edit: hang on a minuteā€¦ youā€™re saying you donā€™t get > 2048 stepsā€¦ sigh. Of course you wonā€™t increase the itterations. Thatā€™s the point, as you increase the added value you just move the goalposts (changed epsilon in proportion to the value you add) but the maximum value where itteration stops is increased.

Thereā€™s nothing funny going on, thankfully.

I was gonna suggest some funky itterator that had no bearing on the kind of accumulation youā€™re doing, but now I see what youā€™re doing a multiplier may be the right approach. You know *= 1.0001 instead of += 1.0. Iā€™m not sure where your cutoff is in decimal, but this is driven by the mantissa precision. You probably want some wiggle room in there. Recursively divide it out for the final answer (precision?), or build a table and look it up. It should work nicely.

now i tried this:

first draw one line with GL_ONE, GL_ONE and then other lines draw with GL_SRC_ALPHA, GL_ZERO ovewr the first line. shader changed to:

gl_FragColor = (1.0, 0.0, 0.0, 1.01);

but i only get value of 1.01953 wich look as only first three lines were used (1.0 * 1.01 * 1.01 = 1.0201) :frowning:

I donā€™t understand your blend modes.

First line: srcColorGL_ONE + dstColorGL_ONE

so after this it should just be (1.0,0.0,0.0,1.01) + dstColor

Next Lines are: srcColorGL_SRC_ALPHA + dstColorGL_ZERO

= (1.0, 0.0, 0.0, 1.01) * 1.01
= (1.01, 0.0, 0.0, 1.0201)

So no matter how many lines you render you will end up with the same value as you are not taking into account any of the destination color.

yes, you are right. it was my mistake, i was in a hurry before going to work :frowning: it should be swaped :slight_smile: now with GL_ZERO, GL_SRC_ALPHA i get better results, but 16bit does not have enough precision and with (matbe the smallest number, greater then 1?) 1.001 i canā€™t get even 16k iterationsā€¦

i think of improving the ā€œfake blendingā€ method, to get predictable results.

Using 16 bit floating point you can store exactly 4096 distinct values with constant spacing (should be 4097 but thereā€™s a bug). For instance, -2047ā€¦2048 incrementing by 1, or -255.75ā€¦256 incrementing by 0.25.

If you only have a single value to blend/accumulate, you can split it across the multiple RGBA channels of the blend target. The total number of effective bits is still disappointing through given the number of bits actually used. Bring on integer support!

Originally posted by shelll:
is it impossible to get more then 2048 steps with fp16 blending? it is too low for me. i need at least up to 200k stepsā€¦
FP16 canā€™t even represent 200k different values. 16 bits is 65536 values, of which some is lost due to NANs, INF, +/- 0 and potentially denorms that are flushed to zero.

HalfCG max absolute value is 2048, Iā€™ve seen it, I think, in the OpenEXR SDK or NVIDIA SDK, canā€™t remember.

then i must use 32bit float textures, no blendig, and ā€œrepairā€ the ā€œfake blendingā€.

one more question. when there will be extension for FBO for single channel render targets? i can use arb_rects and nv_float_buffer, but then ATi cards wonā€™t workā€¦