Fast GL_RGB upload to Texture : TBO ?


I needs to have the fastest RGB 24 bits data streaming to texture (movie). It seems that PBO with RGB24 is slow, however my data are 24 bits … I don’t wants to use CPU to translate from RGB24 to RGB32. Is it possible to use Texture Buffer Object, to copy RGB24 as RGB32 and repack correctly in a shader to 24 bits ? Will it be efficient as pho ? Is there a limitation to use TBO ?

Why don’t want you convert from 24 to 32 bits on the CPU side ???
(it’s a very basic conversion that don’t seem to use as CPU power as this)

Or the movie stream can perhaps to be converted on a 32 bits format at the source ?

Note that the use of YCbCr 4:2:2or 4:1:1 formats can to be a good solution for to have a more compact stream input to handle
(they are used in a very big number of video formats and are more compact than the RGB format for about the same quality)

And on this case you win the time spended by conversions from YcbCr to RGB24 at the input and from RGB24 to RGB32 at the end
=> it’s a win / win for alls sides because you can now handle the entire chain with the YcbCr format …

Here an example of the fragment shader needed for to can display one YCbCr picture data stored into a OpenGL texture

uniform sampler2D tex;

void main(void) 
    float nx, ny, r, g, b, y, u, v;
        float u1,u2,v1,v2;

    nx = gl_TexCoord[0].x;
    ny = gl_TexCoord[0].y;

    y  = texture2D(tex, vec2( (nx),         (ny)*(4.0/6.0)     )).r; 
    u1 = texture2D(tex, vec2( (nx/2.0),     (ny+4.0)/6.0     )).r;
    u2 = texture2D(tex, vec2( (nx/2.0)+0.5, (ny+4.0)/6.0    )).r;
    v1 = texture2D(tex, vec2( (nx/2.0),     (ny+5.0)/6.0    )).r;
    v2 = texture2D(tex, vec2( (nx/2.0)+0.5, (ny+5.0)/6.0    )).r;

    y =  1.1643 * (y - 0.0625);
    u = (u1+u2)/2.0 - 0.5;
    v = (v1+v2)/2.0 - 0.5;

    r = y + 1.5958 * v;
    g = y - 0.39173 * u - 0.8129 * v;
    b = y + 2.017 * u;


You can use this type of calls for to create/bind/load/update the 4:1:1 YCbCr stream into a standard GL_LUMINANCE OpenGL texture :

        glGenTextures(1, &texID);                         // Generate the YCbCr 4:1:1 texture handle
    glBindTexture(GL_TEXTURE_2D, texID);    // and use it
    glTexEnvf(GL_TEXTURE_2D, GL_TEXTURE_ENV_MODE, GL_REPLACE);              // note that GL_REPLACE is certainly not the best thing if you want to make some video mixing with it ...
        glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER,  GL_LINEAR);    // Linear Filtering seem a good compromise between  speed/quality
        glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER,  GL_LINEAR);   // this seem the same thing for the magnification and  minification
        glBindTexture(GL_TEXTURE_2D, texID);  // bind the texture for to replace it  
        glTexImage2D(GL_TEXTURE_2D, 0, GL_LUMINANCE, width,  (height*3/2), 0, GL_LUMINANCE, GL_UNSIGNED_BYTE,  pictures_queue[pictures_read++]); // with a new frame/picture (generated  by libavcodec and/or v4l for example)

Note that the formula height3/2 is used because the Cb and Cr planes are immediately stored after the Y plane i(it’s the plane that store he intensity of the pixel, Cb et Cr planes define the coloration) and have only dimension of width/2 and height/2 on the 4:1:1 format, so the total of the YCbCr planes is only 1.5x the size of the widthheight grey Y plane = 12 bits per pixel
=> this is only the half of the size needed by a 24bit RGB pixel
==> this divide by two the video data stream speed to handle by OpenGL at the input and the output
(AND you don’t have any need of “slows” CPU RGB24 -> RGB32 conversions with this more direct sheme :slight_smile: ).
[the YCbCr 4:1:1 -> RGB32 4:4:4:4 conversion is directly handle into the fragment shader and this is really very fast]