HW discourse: texture interpolation of color by average, alternatives?

First, sorry this question isn’t really about GLSL, I’m uncertain what forum is appropriate to my question.

I’m curious to know if anyone’s ever heard of the texture sampling stage interpolating differently from averaging components individually, or if interpolation is programmable nowadays.

Why ask? Well, I am looking at a test image that is slightly interpolated (minutely downsized) with red and green squares. This is a commonly understood problem for blending, but I don’t think I’ve ever heard of the same problem as it applies to naive interpolation on textures. That’s pretty much hardwired into the hardware.

Red+Green should be yellow, but averaging produces a brown that between the two blocks might as well be a black line. Just a curiosity, why did OpenGL/GLSL hardware not seek to remedy this by offering a luminosity weighted average, or something like this. It could be expensive, but on chip would be the best way to do it then. Why is that not standard if so?

Presumably because it’s not common enough to warrant dedicated hardware. It took long enough for gamma-correct interpolation to be supported (and even now it’s not actually required).

If you want interpolation to preserve luminosity, store the texture in YUV/YCrCb/YIQ/whatever and convert in the shader. Or just use texelFetch/textureGather and perform the interpolation in the shader.

It’s a shader; you could do whatever you want. You can use textureGather to fetch the raw 2x2 quad of values (for a channel) and then interpolate them yourself in whatever way you desire.

So long as you’re willing to pay the cost of implementing this stuff in code rather than letting the implementation/hardware do it.

And that’s why: it would be expensive. Plus, everyone has their own desires about how something should be interpolated.

You’d also need some way to say that a texture would need to use this interpolation. Not every texture contains “colors,” after all.

That’s the rub isn’t it. We see color based on preserving luminosity. Averaging is like mixing paint. It seems like most 3D applications are displaying light instead of mixing paint, so it seems like it would be beneficial to do it correctly, without degrading performance, unless preserving luminosity is undesirable.

It seems like a weird quirk of history. A few years ago I developed a technique that eliminates aliasing that could have been used in the 90s, and has no performance overhead, that is not used, and may be unknown. For all the effort that goes into developing new techniques it seems there remains blind spots in our collective vision for graphics.

A hardware implementation does not imply that there would be no performance loss. More math is going to cost more, period. And considering the amount of such math you have to do to sample a texture (3 averages per sample, plus the number of mipmap samples - 1 averages; anisotropic filtering can tap a lot of samples and require a lot of averaging), that’s not going to be cheap.

No, it’s prioritizing performance, which is what matters in real-time graphics.

That’s odd, because Vulkan seems to require correct sRGB texel accesses. It explicitly puts the sRGB-to-linear conversion into section 15.3.2. Format Conversion, which happens per-texel. This explicitly comes before the filtering section: 15.9.3. Texel Filtering.

It’s surely (at least) an order of magnitude better than doing it manually (as you described) if that’s even possible to do as a drop-in replacement for regular sampling.

§8.24 says:

Even if that were true, it doesn’t change the economics. Adding extra cores (or registers or whatever) will yield greater returns (in terms of marketability) than using the same area of silicon to support hardware filtering in some other colour space.

Also: have you tried performing the test with a sRGB texture and framebuffer? That should produce a brighter yellow. Specifically, something with a perceived brightness between that of red and green; clearly (1,1,0) is going to be brighter than either (1,0,0) or (0,1,0).

I wasn’t saying you were wrong; I’m saying that it’s odd that OpenGL still says that.

That occurred to me but I didn’t want to derail the discussion. The question is more theoretical than that. EDITED: I don’t think the correct outcome would be (1,1,0) but it would be much closer than (0.5,0.5,0) but yes, I’m sure sRGB plays a big part in the distorted results. I can’t say with complete confidence that linear-sRGB would preserve brightness, but my intuition is no, that it has more to do with how our eyes perceive R/G/B.

I think the reason luminosity is lost is because of human vision. The “silicon” to do it I suspect would be not much. The difference is you calculate the white value based on 3 coefficients, that I think very simple but am not going to look it up. Then the white value just becomes another value to be averaged alongside the 3 color components. In the final step the final color would be renormalized similarly with like coefficients.

When you mentioned gamma-correction interpolation I could not tell if you meant linear sRGB in general, or some kind of twist on how it’s interpolated between texels or vertices (which I assume is the same, but don’t know.) But I just thought to add that the reason linear sRGB took so long to come into use is because it looks awful on low-poly meshes. It looks a little better with per-pixel lighting, but essentially because the light is so much more acute it needs finely tessellated vertices to work, which was impractical before it took off.

I have a hard time seeing that as “not much” “silicon”. In order to do this efficiently, you would need to convert these colors into color + luminance when you load them into the texture cache. sRGB linearization is just a 256-element lookup table; here, you need at least a vector dot product.

Then you do linear interpolation some number of times. Which brings up another point: would this allow alpha? If so, that means you would need a 5-element vector, which is a very unnatural vector size to do interpolation on.

And then you convert it back.

That all sounds pretty expensive to me. If alpha is part of the deal, then it takes up 25% more texture cache space (assuming the cache can handle 5-element vectors at all). It requires 5 channels of interpolation rather than 4. And the conversion at cache load time is going to be much more math heavy than sRGB.

I don’t know if it’s worth picking apart, but if implemented in hardware the number of components don’t matter, if the hardware is a dedicated part of a chip. If the difficulty is averaging over samples (anisotropic filtering) the samples can be summed so that the multiplication happens only at the very end. So it would not be incompatible with existing sampling/averaging pathways. The mode could be enabled or left on/ignored. (Yes a dot product would probably calculate the white value of each sample. Yes storing it in the texture, unless using the alpha-component, would be irregular.)

It’s purely hypothetical, but I think it would be worth studying, since it falls into a category of something that could produce better results by looking at a fundamental part of the pipeline in a new way, the way new effects can’t really do. What’s the point of all the effects work if the fundamentals are weak?

In my opinion not doing interpolation correctly according to (approximating) human vision undermines the entire enterprise. So we don’t really know what a difference that could make. Maybe it’s a trivial way to make things better quality that is simply overlooked. There’s a lot of arrogance in fields like graphics. I think it’s often assumed things like this are not interesting, that they were all studied 30yrs ago, but I think that’s unlikely the case, and we should be spending more time really thinking about fundamentals instead of pie-in-the-sky graphical effects.

Hardware isn’t magical; those things take up resources.

According to this paper, texture caches tend to store data in its native format. Thus, when the texture unit tries to read the data, the system will unpack/decompress/linearize the data into full floating-point values.

As such, if there is going to be an implementation of such a thing, this is where at least part of it would have to go. When the texture unit reads your texture, it will unpack it from RGB into an RGBI format. The filtering will all take place in RGBI space.

Now, just that operation might be practical, from a performance standpoint. It requires no extra conceptual hardware from the fetch unit. That is, you don’t have to build the infrastructure to process cached data; you’re effectively just adding another texture format, like R11_G11_B10 or RGB9_E5 or somesuch. I might even believe that such conversion wouldn’t be more complex than BPTC or ASTC decompression.

Of course, once the texture unit gets done with filtering, the data gets masked a bit and then goes straight into a shader register. So on the back end, there’s no semi-general purpose processing stage that you could add another format to. To handle this, you would need to add all new hardware processing infrastructure into a hardware pathway that had no such infrastructure.

Now, you could avoid such a thing by simply letting the user’s shader do it. That is, reading from such a texture would generate RGBI output, and the shader has to convert it back to RGB.

Of course, all of the above only applies if you don’t need alpha. Because everything in the texture unit itself is undoubtedly based on taking and manipulating 4-component data. If the data path is fixed to 4 components (and there is no reason it wouldn’t be), then there are only two things that could be done. You either increase the width of the data path by 25% (intermediate registers, number of computational units, etc), or you run the entire fetch process twice, on different parts of the data.

Neither of these is performance friendly or “compatible with existing sampling/averaging pathways.” So implementation might be reasonable only for RGB textures; once you need to interpolate 5 components, you start breaking the basic assumptions of the system.

Nobody suggested that it wasn’t worth studying. There’s just no point in expecting it to happen.

If you look at how GPUs have been evolving, it is abundantly clear that hardware evolution biases towards greater generality, not specificity. Broadly speaking, the way problems get solved in hardware nowadays is by making shaders more capable, not by adding dedicated hardware for a special case. Blending gets improved by implementing things like ARB_fragment_shader_interlock, which allows fragment shaders to do user-defined blending, or ARB_texture_barrier to allow for read/modify/writes to the framebuffer. Texture filtering gets improved by giving shaders access to textureGather so that shaders can implement custom filtering. And so forth.

At this point, hardware features are primarily judged based on how effectively they support general-purpose operations (or performance). That’s not to say that special-case operations are never done these days (KHR_blend_equation_advanced is a strong counter-example, but the fact that it isn’t core should say something). But broadly speaking, if a special-case feature is going to get into hardware, it needs to be really important.

So feel free to study the problem. Put in the work to prove that it’s doable. But until then, it’s simply antithetical to the direction hardware is going.

First, I find any claim that modifying hardware would/should be “trivial” to be at least somewhat questionable. ASTC (likely more useful than what you’re suggesting) remains largely unsupported by desktop hardware, and what you’re wanting would have to be playing in the same hardware arena.

Second, there’s a simple way to find out if there is any significant quality improvement: implement it yourself with textureGather. You don’t even have to go whole hog with anisotropic; if all you want is proof that quality can be significantly improved, a bilinear or trilinear comparison ought to be adequate to that task.

Alternatively, just upload your pixel data as RGBI to begin with, let the filtering do its job, then convert it back to RGB in the shader. Either way, it should be pretty easy to prove whether there’s any significant image quality improvement over linear interpolation.

FWIW, it should be (0.735, 0.735, 0). With a display gamma of 2.2, this would result in red and green pixels at 50% intensity (0.5085), while (0.5,0.5,0) would given linear intensities of 0.2176.

I mean using an sRGB texture and sRGB framebuffer, so that the hardware (ideally) applies sRGB->linear conversion on the four texels, interpolates the linear intensities, then performs linear->sRGB conversion before writing the result to the framebuffer.

As opposed to the historical approach of treating everything as if it’s linear even though it’s actually sRGB (or something close to it).

But in this particular case it doesn’t matter whether the texture is linear or sRGB because the texture components are 0 or 1, which are unchanged by conversion. The main thing is to perform linear->sRGB conversion on the value written to the framebuffer. If you want to test this without having to either set up a sRGB FBO then blit to the default framebuffer or figure out how to make the default framebuffer sRGB (if that’s even possible), you can just use e.g. gl_FragColor=pow(color, vec4(0.4545)) in the fragment shader.

I don’t want to encourage you guys to overthink this anymore than what we’ve already got here. Just to (try to) clarify… the benefit of interpolating closer to how light works would be eliminate artifacts. That is to produce a superior image, especially in particularly glaring cases. Quality offline upsampling/downsampling algorithms likely consider such things. Their improvement is slight, but deemed worthwhile.

Hardware isn’t “magical” but it doesn’t care if it has 5 components or 4 components. That’s the case for general purpose pathways because they have to settle on some number. Nothing is free, but if we could (I don’t know if it’s necessarily so) interpolate colors better, I think that would be a bigger selling point for some hardware than most anything else. Two colors shouldn’t interpolate by producing a darker or brighter strip of color between then. That looks like veins running through a field of light, and I think everyone can agree that’s undesirable, and might be worth solving one day. Maybe it’s not low-hanging fruit, but it strikes me as so, and I think it’s interesting… to “study”… since real-time images can at least as nice as up/down sampled pictures too.

EDITED: As far as a cache goes, you can produce the white value from the 3 color components, so there is no reason to cache, unless the computation is going to slow down the fetch to the degree it’s a bottleneck. Reading memory may be much slower than computing.