Separate sampler state from texture state

skynet · February 5, 2010, 2:08pm

Don’t mix up texture and texture units. What has been called “texture unit” in former days are “samplers” today. Even with fixed functionality, it was well possible to bind the same texture to multiple texture units.

What we want is achieve is to sample the same series of image data in a different way with each sampler accessing it.
For instance, I’d like to access a certain texture twice in a shader, once with GL_REPEAT and once with GL_CLAMP_TO_EDGE. Today this is only possible if I create two distinct texture objects (thereby replicating the image data, doubling the VRAM usage).

And about the wrapping, cannot this to be easily handled by something like (x%with,y%height) into shaders ?

No, you cannot easily emulate this in a shader, because you also need to consider that one texture sample might tap the texture multiple times! Each tap needs to be wrapped independently.
The hardware does this very fast and efficient today.

Alfonse_Reinheart · February 5, 2010, 2:28pm

I have already encounter the problem when I have wanted the possibility to blend two YCbCr textures on 4:2:0 format with an hardware that can only handle two textures units …

To add to what Skynet has said, if your hardware can only handle two texture accesses in one pass, then it can only handle two texture accesses in one pass.

What’s being discussed here is essentially API cleanup. It won’t magically allow hardware to do something that it could not before. But it will make it easier and more intuitive for the user to communicate their intentions to OpenGL.

Yann_LE_PETITCORPS · February 5, 2010, 2:28pm

Thanks, Skynet for your response

It’s about the possibility of multiples samplers into the same texture unit that I have wanted to speak. (it’s true that I make often the mistake )

But technicaly, why two texture units cannot access to the same texture data ?
(cf. replicating the data)

@+
Yannoo

Yann_LE_PETITCORPS · February 5, 2010, 2:42pm

Yes, Alphonse it’s true.

I have only found this problem with very old hardware
(and that cannot handle shaders, so I have resolved this at the source with a YCbCr to RGB conversion in software => this is more slow but this work )

But it’s not because we have now more texture units that in the past that we are in the obligation to use alls

@+
Yannoo

Yann_LE_PETITCORPS · February 5, 2010, 3:14pm

I have reread this thread :

glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, TextureName);
glBindSampler(GL_TEXTURE_2D, SamplerName0);

seem me good … but it cannot handle more that only one sampler per texture unit

I have see a day something about centroids texels in GLSL

http://www.opengl.org/pipeline/article/vol003_6/

=> cannot the idea to be extended for to handle multiples sampler states directly in the shader with something like this ?

nearest/linear/bilinear/centroid  clamped/wrapped sampler2D texsample;

And when we haven’t the nearest/linear/… clamped/wrapped prefixed states in the shader , it use the texture states as default …

@+
Yannoo

Alfonse_Reinheart · February 5, 2010, 4:08pm

it cannot handle more that only one sampler per texture unit

Right. Because the concept makes no sense. A sampler is a texture unit. It represents access to a specific texture with a specific set of parameters.

nearest/linear/bilinear/centroid clamped/wrapped sampler2D texsample;

And what does “bilinear” mean (especially for a 1D or 3D texture), and how does it differ from “linear”? And which directions are “clamped”?

Filter parameters in the shader should look like this:


sampler2D diffuseTexture(mag = linear, min = linear_mipmap_linear, wrap_s = repeat, wrap_t = clamp, max_aniso_ext = 4.0);

Yann_LE_PETITCORPS · February 5, 2010, 4:21pm

Thanks Alphonse, I have now really understand that a sampler and a texture unit is exactely the same thing
(I understand quickly but it must sometimes explain to me a long time )

For a sampler2D, it’s true that it’s only GL_LINEAR

And of course

sampler2D diffuseTexture(mag = linear, min = linear_mipmap_linear, wrap_s = repeat, wrap_t = clamp, max_aniso_ext = 4.0);

is really very far better, simple to work with … and a very good response to the subject of this thread

But what about to share the same texel data between multiples samplers
=> it’s technicaly possible or not ?

It’s for to use with one interleaved and tiled texture where the half of data (cf. RGB/YUV data) have to be interpolated, when the other half of data (cf. indices) cannot be interpolated
(something like a “multi-picture DXT” that individualy interpolate each RGB/YUV component in a 4x4 or 8x8 bloc, and not only a line between two colors in a 4x4 bloc as in DXTn)
=> I have to use multiples samplers/texture units for to handle inter-picture interpolations
==> and I want use this for to reduce the memory used, so have multiples copy of the same data isn’t really what I want

I can separe the texture in multiples parts before via the CPU but I find that this spend %CPU for nothing and this generate two parts per image x4 (“IPBB semi-compressed GOP”)
=> 8 textures parts
==> 8 textures units/samplers …
===> but on other side, it’s true that I can too assemble similar data chunks for to use only two 3D samplers/texture units for my GOP of 4 pictures,
(cf. one GL_LINEAR YCbCr 3D texture for the color data and one GL_NEAREST unsigned byte 3D texture for indices)
====> so, I have finaly no need of “separate sampler state from textures states” for to handle this (only two 3D textures but with one different sampler for each) … but I’m sure that this can really be a very good extension

@+
Yannoo

Brolingstanz · February 6, 2010, 7:45am

So… it’s settled then; we definately need a separation of sampler and texture.

Chris_Lux · February 7, 2010, 9:48am

…and while at it, get rid of the texture unit nonsense. it is ridiculous to be forced to keep track of what texture was bound to which unit when all i want to do is bind the texture object to a sampler in a shader!

Yann_LE_PETITCORPS · February 12, 2010, 12:11pm

Yes, Chris I think exactely the same thing

But I think too that the “bind texture by unit” is good for maintain compatibility with olders versions of OpenGL that use multiples chained textures units for to handle the multitexturing
=> so something this isn’t a very bad thing as this …

On another side, I find too that to have now very limited possibilities because of this into fragments shaders is really ridiculous in 2010 …

Such as the fact that we haven’t now in 2010 a direct JPEG/MPEG support hardware in OpenGL textures … but this is another story
(a PocketPC can very easily compress/decompress JPEG and decompress MPEG files in pure software with a very little processor, so no reasons about to speak power processing problems or others lies)

And when I think that a M(J)PEG video is only successives JPEG pictures (that can be easily handled by chained textures units that can work with the JPEG standardised format) … it’s really a nightmare to see all the time loose for nothing since a lot of years …

@+
Yannoo

Alfonse_Reinheart · February 12, 2010, 12:23pm

On another side, I find too that to have now very limited possibilities because of this into fragments shaders is really ridiculous in 2010

How? The hardware cannot use more than X textures, period. So you would gain nothing by having texture binding that does not deal with numbered texture units. Thus, nothing can be considered “very limited possibilities.”

(a PocketPC can very easily compress/decompress JPEG and decompress MPEG files in pure software with a very little processor, so no reasons about to speak power processing problems or others lies)

Really? Can they decompress a 4096x4096 texture 100,000 times per frame at 60+ FPS?

I didn’t think so.

Yann_LE_PETITCORPS · February 12, 2010, 12:39pm

A 720x576 MPEG2 (cf. DVD) video texture support at 25/30/50/60 fps can be a good start, no ???

And I’m for a 4096x4096 MPEG4 video texturing at 100 000 fps on another side
But after …

And a NVIDIA GeForce GTX 285 have 80 texture units
(first link visited http://hothardware.com/Articles/NVIDIA-GeForce-GTX-285-Unveiled/)
=> so really more than one second of video if we bind DVD successives MPEG pictures into successives GPU textures units …
(or 80 separates sampler state from texture state if you prefer …)

The hardware IS ALREADY HERE (and since a lot of years …)

And this feature IS REALLY WANTED by a lot of users …

@+
Yannoo

skynet · February 12, 2010, 1:16pm

Assumed, we would bind textures directly to samplers (which seem to be the ‘real’ texture units today), wouldn’t the shader compiler just warn about exceeding the hardware limits, when too many samplers get accessed?

The only advantage of the texture-to-unit-to-sampler indirection I can think of is, that by switching shaders, the currently bound textures ‘switch’, too. But on the other hand side, I don’t mind this indirection. Usually, I set the sampler-uniforms once right after the shader gets created and then leave it that way. The actual binding of texture to shader then happens by binding the texture to its destined unit. So, there’s not much of confusion and state-fiddling going on.

The only downside is that when switching to a certain shader, you need to rebind all textures particular to this shader, because in meantime other code parts may have changed the texture-unit bindings. Binding textures to samplers would result in textures being bound to a sampler essentially ‘forever’, which might be good for samplers that always only access one certain texture (for instance, lookup-textures).

Alfonse_Reinheart · February 12, 2010, 1:17pm

A 720x576 MPEG2 (cf. DVD) video texture support at 25/30/50/60 fps can be a good start, no ???

No. Not for someone who’s actually interested in 3D rendering, rather than playing movies.

And a NVIDIA GeForce GTX 285 have 80 texture units …

It also has 240 processor cores. That doesn’t mean you get to bind 240 programs and run them all at the same time.

Yann_LE_PETITCORPS · February 12, 2010, 1:26pm

Not only playing movies … display/map numerous video streams on numerous 3D shapes
=> this isn’t really the same thing …
(but ok, this can begin with playing/mapping 6 movies in // on a cube for example)

And this need only one IPBB chunk (so 4 linked textures units) per video stream
=> so 20 video streams in // with 80 textures units …

Is for you texturing multiples quads and see a BD the same thing ???
For me, no …

And it’s for a 4D framework (X,Y,Z,T) that can handle/mix various video streams in input (webcams for exemple) into various and distincts animated 3D shapes , not only for a BD or only display individuals 2D movies (I found that libavcodec/ffmpeg handle this very nice for exemple … I can per example fill the screen on my iMac with about ten video streams in //, each mapped on a 3D rotated cube and on individuals OpenGL glut windows, but the %CPU is near to 100% and I have saccades)

But the subject of this thread is “Separate sampler state from texture state” …
=> 20 separates quadri-samplers (or 10 octo-samplers) can perhaps to be a good start for beginning

I prefer only 80 video textures at 1920x1080 and 60+ FPS (9 953 280 000 texels)
than a cosmologic “4096x4096 texture 100,000 times per frame at 60+ FPS” (100 663 296 000 000 texels)

=> the mathematics confirm that your version is about 10000x bigger that mine
==> they are really more than 10000 texture units in a GPU ???
(I have certainly make somes mistakes in computations but in all cases the factor is a lot of power 10 … and I have only 5 fingers per hand)

@+
Yannoo

Yann_LE_PETITCORPS · February 12, 2010, 4:24pm

1920x1080 = 240x135 blocs of 8x8 texels

This make less than 15x9 patchs of 16x16 blocs of 8x8 texels

So, “only” 16x16=256 “simples/littles” texture units of 8x8=64 texels each
For handle one video texture chunk of 4 IPBB HD pictures at 1920x1080 …
(but ok, with 15x9=135 reloads of the 256 textures units)

=> I think this can and have to be incorpored into futurs GPUs …

And about this thread, this make 15x9(x256?) separate sampler states per texture

@+
Yannoo

Alfonse_Reinheart · February 12, 2010, 4:40pm

display/map numerous video streams on numerous 3D shapes
=> this isn’t really the same thing …
(but ok, this can begin with playing/mapping 6 movies in // on a cube for example)

And what application does this have to doing quality 3D graphics?

And this need only one IPBB chunk (so 4 linked textures units) per video stream

Or you know, a single array texture.

If you’re not going to effectively use the features you currently have, there’s no reason to expect that you’ll effectively use what more powerful hardware will bring.

But the subject of this thread is “Separate sampler state from texture state” …
=> 20 separates quadri-samplers (or 10 octo-samplers) can perhaps to be a good start for beginning

No, that has nothing to do with what is being discussed here. That’s something you want for your own, very limited needs.

Sampler state means exactly that: the set of state associated with sampling from a texture. It doesn’t mean “whatever Yann LE PETITCORPS wants it to mean.”

=> the mathematics confirm that your version is about 10000x bigger that mine

And my version is also 100,000 times more generally useful. 100,000 samples per frame is at the low end of the number of samples that are used per frame for any modern game. Most games render a minimum of 800,000 (1024x768) pixels per frame, and each of those pixels requires sampling from at least one texture. Hence a minimum of 800,000 samples per frame.

A texture unit therefore must be fast. Burdening it with nonsense like accessing from 4 separate textures simultaneously just because it’s easier for you than using array textures is not conducive to keeping them fast.

Yann_LE_PETITCORPS · February 12, 2010, 5:04pm

For this time, only for my personnal pleasure

If I follow your same confused reasoning, why to have invented the color TV, the VCD/DVD and others blue-rays/VOD inventions when Eadweard Muybridge have already found the idea of cinema in 1878
(Histoire du cinéma — Wikipédia)

Since millenary, we can alls walks with our foots, but since one or two centuries we can too use train/car or fly per example …

And this have already a name from a very long time … the progress of the science

And it’s true that 800 000 pixels per frame is really a mimimum, because an HD picture at 1920x1080 frame is 2 073 600 pixels
(but a HD video use something like 60 textures of 1920x1080 texels per second)

@+
Yannoo

Alfonse_Reinheart · February 12, 2010, 5:32pm

If I follow your same confused reasoning, why to have invented the color TV, the VCD/DVD and others blue-rays/VOD inventions when Eadweard Muybridge have already found the idea of cinema in 1878

Um, no. My point is that you are the only one who wants to decode 20 MPEG streams simultaneously and display them as textures in 3D space. It is not something that is generally useful, and therefore it is not something that dedicated hardware should deal with.

All of those things you cite are generally useful, unlike what you’re proposing here.

Yann_LE_PETITCORPS · February 12, 2010, 5:35pm

But it’s usefull because this is already usable

I think that the telephon was primary used for “théâtrophone” no ?

Now in 2010, we have ADSL with a lot of TV channels arising from this invention for example …

And I want only use what can already make the hardware when it decode an MPEG video streams into a window … but want that the decoding is directly make into one OpenGL texture and not in a window … it’s really as difficult as this to understand ???
(the hardware have to store the pictures into buffers that are stored in video memory before that this can be converted on video signal for the screen, no ?)

And in the way, I see that a lot of things aren’t perfect, is all …
Such as the fact that we haven’t separates samplers states from texture state
(the hardware can already make multiple video memory access in // when it handle mipmapping, linear filtering or the DXT decompression, no ?)

Currents GPU use multiples SIMD processors in //, it’s clear
=> but can this to be a little “segmented” or not ?
==> something like “contries of numerous SIMD processors”

For example, DXT textures are make of 4x4 blocs of pixels
=> is imaginable that a different DXT method can be used for each differents blocs (or for a more big patch than contain 8x8 or 16x16 similars DXT blocs for examples) ?

For example in this world, we have lots of groups of persons that work together (or not) for the same thing.
But luckily alls persons in this world don’t make exactely the same thing in the same time

@+
Yannoo