Next gen hardware NEEDS this feature

The feature I find myself wanting the most is programmable blending that works with any precision output buffer.

It would be useful to be able to write a customizable blender with a little assembly language, call it GL_blend_program_ARB. Even better would be to have access to the frame buffer from within a fragment program

That is less important, however, than it is to be able to blend into a floating point buffer. I’d wager that most people want to use floating point buffers to be able to accumulate multiple light sources without saturation, which would require an additive blend (or, currently, ping-ponging between two buffers).

Unfortunately for me, I want to accumulate so many things (particles) that the swapping takes too long .

Hopefully some kind soul at ATI or NVIDIA will read this board and grant my wish

[This message has been edited by Zeno (edited 07-11-2003).]

1: Why do people ask for features (be it looping vertex shaders, “texture accesses” in vertex programs, or floating-point blending) that hardware developers are already being pressured to provide? It’s not like they don’t already realize that it would be very useful to have blending work with fp buffers.

2: I’ve been meaning to ask you this. What effect are you trying to achieve that requires the accumulation of thousands of particles?

3: Since denormalized floats are not required to be supported by hardware, and even a 32-bit float only provides around 7-8 decimal places of mantissa without denorms, just how much precision do you expect to get when you’re rendering 10,000 overlapping particles?

1: Why do people ask for features (be it looping vertex shaders, “texture accesses” in vertex programs, or floating-point blending) that hardware developers are already being pressured to provide? It’s not like they don’t already realize that it would be very useful to have blending work with fp buffers.

Probably one of two reasons: 1) they don’t know that people are pressuring for that feature or, in my case 2) they want to throw in another vote for the feature, so if a company is sitting around trying to decide how to use an additional x million transistors, they can say “well, we’ve been getting a lot of demand for feature so and so…”.

2: I’ve been meaning to ask you this. What effect are you trying to achieve that requires the accumulation of thousands of particles?

Global lighting. I can’t say much more.

3: Since denormalized floats are not required to be supported by hardware, and even a 32-bit float only provides around 7-8 decimal places of mantissa without denorms, just how much precision do you expect to get when you’re rendering 10,000 overlapping particles?

32 bit floats are sufficient for my software implementation, so I’m quite confident they’ll be sufficient in hardware. The particles are spread around the screen and are relatively small. It’s not like I have 10k particles all sitting on top of one another, but even 10 can easily saturate 8 bit color.

Global lighting. I can’t say much more.

Um, how are particles involved in global lighting? Or are you refering to atmospheric/volumetric lighting (ie, localized fog)?

32 bit floats are sufficient for my software implementation, so I’m quite confident they’ll be sufficient in hardware.

Your software uses a CPU that can handle denormalized floats. GPU’s don’t handle these yet.

Korval,
if ten particles can saturate to 1.0, that means that any one of them has an intensity of roughly 0.1. Not too tough for floating point hardware, even without denormals.

Zeno,
Framebuffer access (at the same position as the currently processed fragment will write to) is part of GLslang and therefore should be part of hardware rsn. You’ll be able to do any blending you want with that.

Framebuffer access (at the same position as the currently processed fragment will write to) is part of GLslang

No it isn’t. Observe the glslang 1.051 spec (apparently ARB-approved):

  1. Should the fragment shader be allowed to read the current location in the frame buffer?
    DISCUSSION: It may be difficult to specify this properly while taking into account multisampling. It
    also may be quite difficult for hardware implementors to implement this capability, at least with
    reasonable performance. But this was one of the top two requested items after the original release of
    the shading language white paper. ISVs continue to tell us that they need this capability, and that it
    must be high performance.
    RESOLUTION: Yes. This is allowed, with strong cautions as to performance impacts.
    REOPENED on December 10, 2002. There is too much concern about impact to performance and
    impracticallity of implementation.
    CLOSED on December 10, 2002.

So, no framebuffer reads.

Ah, I see. I was reading the somewhat older OGL2_Shading_Language_1.2.pdf. I stand corrected.

Just to add another voice to the request (we should start a chorus soon .

In my situation I actually have a couple million samples distributed all over the screen. On average only a couple (lets say < 100) useful ones will fall into one pixel. Many have zero contributions, but the decision how much a particle contributes is made in a vertex shader, so I can’t sort them out beforehand. The problem is that the contribution of each particle can range from 1e-6 to 1e6 and will be non-linearly mapped after the accumulation into the 0-1 range. Thus the need for FP blending.

The context is real lighting simulation. The eye has the damn ability to handle 12 orders of magnitude dynamics, thus physical lighting calculations tend to
cross the 8 bit boundary rather quickly.

Originally posted by Zeno:

Global lighting. I can’t say much more.

Sounds like you’re trying to do something like photon mapping, but in screen space.

The context is real lighting simulation. The eye has the damn ability to handle 12 orders of magnitude dynamics, thus physical lighting calculations tend to
cross the 8 bit boundary rather quickly.

This is exactly the problem. The next step in real-time photorealism (after the current stage of bump maps and stencil shadows) is probably high dynamic range lighting. This can be done now with a local lighting model. Take a look at this program if you have the hardware, especially the porcelain skull: http://www.daionet.gr.jp/~masa/rthdribl/index.html .

But when trying to add in global lighting terms I keep coming back to needing to render into a float buffer with additive blending.

But when trying to add in global lighting terms I keep coming back to needing to render into a float buffer with additive blending.

That’s where I start getting confused. What do you mean by “global lighting”? Are you refering to atmospheric effects?

As far as I know, global illumination is the process of, when computing the illumination of a surface, taking into account all sources that emit or reflect light, including indirect sources (as well as taking into account occluding objects). This requires a lot more than merely having floating-point blending; this needs something like ray tracing (for specular) or radiosity (for diffuse). Interreflection is the name of the game for global illumination, and no ammount of scan converting particals can help.

It seems like what you’re trying to do is more like atmospheric fog (and the lighting effects on it) than true global illumination.

[This message has been edited by Korval (edited 07-15-2003).]

Originally posted by Korval:
[b] That’s where I start getting confused. What do you mean by “global lighting”? Are you refering to atmospheric effects?

As far as I know, global illumination is the process of, when computing the illumination of a surface, taking into account all sources that emit or reflect light, including indirect sources (as well as taking into account occluding objects). This requires a lot more than merely having floating-point blending; this needs something like ray tracing (for specular) or radiosity (for diffuse). Interreflection is the name of the game for global illumination, and no ammount of scan converting particals can help.

It seems like what you’re trying to do is more like atmospheric fog (and the lighting effects on it) than true global illumination.

[This message has been edited by Korval (edited 07-15-2003).][/b]

Look, Korval, I wish I could explain exactly what I’m doing but I can’t. There are several ways to get a global lighting solution, and once you have such a solution there are several ways that you can choose from to display it. At least one way in which particles could come into the display phase should be obvious from some other people’s replies to my posts. Why are you so intent on busting my balls on this issue? If you just assume for two seconds that I’m not an idiot and that the best solution to my problem is higher precision, you might be able to be helpful instead.

There are several ways to get a global lighting solution, and once you have such a solution there are several ways that you can choose from to display it.<section removed>If you just assume for two seconds that I’m not an idiot and that the best solution to my problem is higher precision, you might be able to be helpful instead.

As you stated, there is more than one solution to displaying a global lighting solution. Maybe the one you’re suggesting isn’t the best way to handle it. Of course, since you aren’t inclined to explain precisely what you’re trying to accomplish, no one can suggest alternative methods that might accomplish the same goal.

And, while you’re probably not an idiot, you certainly are not infalible. How do you know that, if you explain the problem, I, or someone else on this forum, wouldn’t be able to figure out an appropriate solution. This is why it is usually best for multiple people to be involved with a project; 4 people will catch something that 1 person missed.

Lastly, you do realize that, once they have floating-point buffer blending, doing blending between 4 32-bit floating-point numbers is going to be at least 4x slower than regular blending, correct? As such, though your algorithm will become possible, it will not become particularly fast (though certainly faster than flipping buffers).

Originally posted by Korval:
[b]And, while you’re probably not an idiot, you certainly are not infalible. How do you know that, if you explain the problem, I, or someone else on this forum, wouldn’t be able to figure out an appropriate solution. This is why it is usually best for multiple people to be involved with a project; 4 people will catch something that 1 person missed.

Lastly, you do realize that, once they have floating-point buffer blending, doing blending between 4 32-bit floating-point numbers is going to be at least 4x slower than regular blending, correct? As such, though your algorithm will become possible, it will not become particularly fast (though certainly faster than flipping buffers).[/b]
The ability to dismiss all and everything is not a prerequisite to participate in this section of the forums
Zeno has a point.

Avoiding the copy alone could save lots of memory, not to mention time. The ‘extra computation’ part of the request is exactly zero, please ignore it.

What we’re arguing about here is the ability of hardware to do read-modify-write operations on a single memory location vs read here, modify, write somewhere else. A waste of memory, most of the time at least. That’s why I second the motion. I’m not interested at all in better solutions to the particular problem that prompted the request.

What we’re arguing about here is the ability of hardware to do read-modify-write operations on a single memory location vs read here, modify, write somewhere else.

No. Nobody here is arguing the merits of the requested functionality. The current lack of said functionality is clearly due to the infant state of current floating-point buffer implementation. It is inevitable that this functionality will arrive, and likely in the next hardware generation. And if it doesn’t, then no amount of argument from the masses would have made it happen.

They know the need for the functionality. Indeed, it is immediately obvious that lacking this ability limits the hardware’s power to perform various functions. There’s no question of that, and none from any of the companies has argued against having the functionality for any reason other than cost.

My initial argument was that there is no need to push for functionality that is clearly already on its way. It would be a better use of time to push for functionality that could be iffy, like adding bindable memory to vertex programs (so that they can walk a memory buffer rather than have to rely on internal registers), adding programmability to the command-processor, or adding some kind of fragment-program register that can be used to store information between invokations of the program (to determine the maximum/minimum dynamic range for HDR).

There’s little need to fight for battles you’ve already won (or the things that were never really contested). Instead, fight for the battles that are still in progress.

And I wanted to try to offer up an interrium solution (assuming that it is not better than the blending-based one) that could function on current hardware.

[This message has been edited by Korval (edited 07-16-2003).]

Zeno, if your problem is accumulating something in high range and then applying exposure on it, maybe you could solve the issue by accumulating the particles with multiplicative blending into a normal framebuffer. With the intensities you have been talking about (around 0.1 per particle) this should work very well.

In case you don’t know the trick, start from white, draw particles with multiplicative blending using negative colors, and in the end invert (as in negative) the framebuffer. The result is the exposured sum of the particles assuming you’ve applied exposure on the single particles’ intensities.

-Ilkka