Efficient transfer of planar images for rendering in OpenGL?

What is the most efficient way to transfer planar YUVA images for rendering in OpenGL?

Currently I’m using 4 separate textures (Y, U, V, A) to which I upload to from 4 separate PBOs during each frame. However, it seems to be much more efficient to transfer a lot of data in few textures, e.g. transferring YUV422 to a single packed texture is ~50% faster than transferring the same data to 3 (Y, U, V) separate textures.

Some thoughts I’ve had on the matter is whether I could use 2 array textures, one for (Y, A) and one for (U, V), would that be faster?

Another alternative I’ve considered is to convert from planar to packed while copying data to the PBO for transfer, though this does have some CPU overhead.

Any suggestions or insights?

NOTE: dim(Y) == dim(A) && dim(U) == dim(V) && dim(Y) != dim(U).

the biggest problem is that I imagine you use GL_LUMINANCE to transfert each single channel … and it seems that this format is not supported without swizlling by the driver to rgba … I have the same issue and I don’t know what I can do to do the trick and transfert an yuv420 planar in an rgba texture like I do with a yuv422 packed frame… for now it’s more efficient to send an yuv422 frame than a yuv420 planar even if the planar yuv420 is half the size !!

I’m using GL_R8 for internal format, GL_RED for format and GL_UNSIGNED_BYTE for type.

I think that I can be certainly possible to store two consecutives YcbCr 420 pictures into one RGB texture and use a fragment shader to retrieve correct texels values
(cf. there is no texels data inused because one 422 RGB picture [24 bits/pixels] have to handle exactly the same size of data than two 420 YcbCr pictures [12 bits/pixel])

Note that using this method, you can too handle four YCbCr 420 interlaced frames with only one RGB 422 picture if you adapt the shader for to handle the odd/even interlacing sheme into it …

On the same idea, you can too add one supplementary odd/even horizontal interlacing sheme for to can handle height YCrCr interlaced frames (vertically and horizontally, not only vertically) with only one RGB picture
=> this give a video quality loose but these seem visually not as bad as this if you use a relatively high frequency refresh
(because pixels values don’t change very much between two interlaced frames if you don’t have very fast displacements on a video with a high frequency refresh)

Note too that the alpha channel can be used for anothers things if you use an RGBA format
(for example, the alpha channel can to used for to embed a lot of subtitles into the interlaced video stream)

Another idea is to store four YCbCR frames (that are not interlaced) into the four separates component (Red, Green, Blue and Alpha) of the RGBA texture and to use the fragment shader for to handle the conversion of each component at each frame

=> the idea can too be extented for to handle height interlaced frames if you use the standard odd/even vertical interlacing sheme … and possibly 16 bi-interlaced frames if you have to handle a relatively slow motion on a relatively high refresh frequency

==> on this case the “slow” texture transfert rate between the CPU and the GPU isn’t a problem because the size of the data to transfert is divided by 8 (with bi-interlaced frames, “slow” motion and “fast” frequency refresh) compared to the initial RGB 422 format :slight_smile:

Here is the fragment shader that handle only one non interlaced YCbCr frame stored into one Opengl texture slot with the GL_LUMINANCE format and GL_UNIGNED_BYTE type
(this work perhaps faster than the GL_RED format with the GL_BYTE type ?)

uniform sampler2D tex;

void main(void) 
    float nx, ny, r, g, b, y, u, v;
        float u1,u2,v1,v2;

    nx = gl_TexCoord[0].x;
    ny = gl_TexCoord[0].y;

    y  = texture2D(tex, vec2( (nx),         (ny)*(4.0/6.0)     )).r; 
    u1 = texture2D(tex, vec2( (nx/2.0),     (ny+4.0)/6.0     )).r;
    u2 = texture2D(tex, vec2( (nx/2.0)+0.5, (ny+4.0)/6.0    )).r;
    v1 = texture2D(tex, vec2( (nx/2.0),     (ny+5.0)/6.0    )).r;
    v2 = texture2D(tex, vec2( (nx/2.0)+0.5, (ny+5.0)/6.0    )).r;

    y =  1.1643 * (y - 0.0625);
    u = (u1+u2)/2.0 - 0.5;
    v = (v1+v2)/2.0 - 0.5;

    r = y + 1.5958 * v;
    g = y - 0.39173 * u - 0.8129 * v;
    b = y + 2.017 * u;


And the calls for to create/bind/update the 4:2:0 YCbCr frame into a standard GL_LUMINANCE OpenGL texture :

        glGenTextures(1, &texID);
    glBindTexture(GL_TEXTURE_2D, texID);    glTexEnvf(GL_TEXTURE_2D, GL_TEXTURE_ENV_MODE, GL_REPLACE); 

        glBindTexture(GL_TEXTURE_2D, texID);  

        glTexImage2D(GL_TEXTURE_2D, 0, GL_LUMINANCE, width,  (height*3/2), 0, GL_LUMINANCE, GL_UNSIGNED_BYTE,  pictures_queue[pictures_read++]); 

The height3/2 formula is used because the U and V planes are immediately after the Y plane and have only dimension of width/2 and height/2, so the total of the YUV planes is only 1.5x the size of the widthheight grey Y plane.

=> I test to adapt this code the next week for to can handle height interlaced YCbCr 420 frames into only one GL_RGBA texture instead only one no interlaced YCbCr 420 frame mapped into one OpenGL GL_LUMINANCE texture as in this “old” code
(I think to test if this is visually correct with 16 bi-interlaced frames just after)