i have some problems with fp16 blending. i use GL_ONE, GL_ONE blending, depthtesting is disabled, fragment shader does just this:
gl_FragColor.r = 1.0;
after rendering to FBO, i get bad results. when overdraw is less then 2048, i get correct results, but when overdraw is higher, then i get the value of only 2048! (for debug purposes i just readback all pixels from FBO: glGetTexImage(GL_TEXTURE_2D, 0, GL_RED, GL_FLOAT, buffer)
i use GL_RGBA_FLOAT16_ATI as render target. gf6600gt, win xp sp2, 81.98 and i render just lines.
I think 2048 seems about right for a 16-bit float. It is a the max of 11 bits. Other 5 bits will be for the sign and exponent.
To explain a bit more, in floating point math when you add two numbers together, if one is big enough so it completely overshadows the other number it acts like the little number was ignored.
(I know this is a little rough of a description - google for floating point math)
Looks like you are going to have to switch to another FBO after 2048 passes or use 32-bit floats (probably no hardware blending support however)
Youāll have to explain further. It sounds to me like (as sqrt[-1] said), youāre just seeing the result of exponent-dependent āepsilonā in floating point numbers. Youād hit a different maximum if you started from zero and added 0.25 each time, for exampleā¦
OK, but that doesnāt really affect what Cass just explained and as sqrt hinted at. Specifically, if the exponent is large enough the 1.0 becomes insignificant, or unrepresentable because the mantissa is (virtually) shifted left by the exponent bits and you just run out of significant bits in the mantissa to cover the bits where something in the range of 1.0 would be added. Basically X+1 = X for a large exponent, i.e. when X becomes large enough. As Cass further explained, if you change the value of the increment then the exponent for which your mantissa cannot cover the incrementing value changes. i.e. if you add 2.0 youāll see cutoff higher than you do now, if you add 0.5 the cutoff will happen sooner. Itās just a quirk of floating point arithmetic, it applies equally well to float32 arithmetic on the CPU.
At some point there is a real maxfloat value but itās sure as heck not 2k. Think of 2^(exponent bits) and thatās the kind of unused LSBs youād have in a fixed point scheme. i.e you have a crap load of rangeā¦ you just donāt have the same precision over all of that range.
i know why it happens, i was just confused why it did not happen with āblending shaderā. and that i realized right now, shaders arithmetic is 32-bitā¦
i tried some different numbers. best are higher then 0.25, with all of them i get 2048 steps. is it impossible to get more then 2048 steps with fp16 blending? it is too low for me. i need at least up to 200k stepsā¦
Thinking more carefully with 5 bits of exponent thereās 32 shift at least and probably +/- 16. So youāve got 1.0 -> 1.999ā¦ (to the limit of mantissa precision) *2^16, now thatās an assumption. I donāt know what the base shift for HALF is they may have erred on the side of fractional values but that wouldnāt make sense.
It could be an fp bug, the exponent offset (nominal exponent for 1.0) could err towards precision rather than range (possible due to human contrast sensitivity concerns), there could be some non float jiggery pokery going on internally, or maybe weāre just missing something, you could try adding a significantly larger number 2x should do it 4 x certainly should but just for fun try incrementing by 16 for example just to make sure.
Edit: hang on a minuteā¦ youāre saying you donāt get > 2048 stepsā¦ sigh. Of course you wonāt increase the itterations. Thatās the point, as you increase the added value you just move the goalposts (changed epsilon in proportion to the value you add) but the maximum value where itteration stops is increased.
Thereās nothing funny going on, thankfully.
I was gonna suggest some funky itterator that had no bearing on the kind of accumulation youāre doing, but now I see what youāre doing a multiplier may be the right approach. You know *= 1.0001 instead of += 1.0. Iām not sure where your cutoff is in decimal, but this is driven by the mantissa precision. You probably want some wiggle room in there. Recursively divide it out for the final answer (precision?), or build a table and look it up. It should work nicely.
yes, you are right. it was my mistake, i was in a hurry before going to work it should be swaped now with GL_ZERO, GL_SRC_ALPHA i get better results, but 16bit does not have enough precision and with (matbe the smallest number, greater then 1?) 1.001 i canāt get even 16k iterationsā¦
i think of improving the āfake blendingā method, to get predictable results.
Using 16 bit floating point you can store exactly 4096 distinct values with constant spacing (should be 4097 but thereās a bug). For instance, -2047ā¦2048 incrementing by 1, or -255.75ā¦256 incrementing by 0.25.
If you only have a single value to blend/accumulate, you can split it across the multiple RGBA channels of the blend target. The total number of effective bits is still disappointing through given the number of bits actually used. Bring on integer support!
Originally posted by shelll: is it impossible to get more then 2048 steps with fp16 blending? it is too low for me. i need at least up to 200k stepsā¦
FP16 canāt even represent 200k different values. 16 bits is 65536 values, of which some is lost due to NANs, INF, +/- 0 and potentially denorms that are flushed to zero.
then i must use 32bit float textures, no blendig, and ārepairā the āfake blendingā.
one more question. when there will be extension for FBO for single channel render targets? i can use arb_rects and nv_float_buffer, but then ATi cards wonāt workā¦