Performance of integer texture upload

I create a integer texture and use PBO to update it,but I found the upload performance is only about 10% of regular texture(the texture element is normalized),why? Below is my texture create & upload code:

// Create :
glGenTextures(1,&m_texInteger);
glBindTexture(GL_TEXTURE_2D,m_texInteger);
glTexEnvf(GL_TEXTURE_ENV,GL_TEXTURE_ENV_MODE,GL_REPLACE);
glTexParameteri(GL_TEXTURE_2D,GL_TEXTURE_MAG_FILTER,GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D,GL_TEXTURE_MIN_FILTER,GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D,GL_TEXTURE_WRAP_S,GL_CLAMP);
glTexParameteri(GL_TEXTURE_2D,GL_TEXTURE_WRAP_T,GL_CLAMP);
glTexImage2D(GL_TEXTURE_2D,0,GL_RGBA32UI_EXT,TWidth,THeight,0,GL_BGRA_INTEGER_EXT,GL_UNSIGNED_BYTE,0);

// Upload :
glBindBuffer(GL_PIXEL_UNPACK_BUFFER,m_pbObject);
BYTE * pDmaAddr = (BYTE*)glMapBuffer(GL_PIXEL_UNPACK_BUFFER,GL_WRITE_ONLY);
CopyMemory(pDmaAddr,m_pImageData,TWidthTHeight4); glUnmapBuffer(GL_PIXEL_UNPACK_BUFFER); glBindTexture(GL_TEXTURE_2D,m_texInteger);
glTexSubImage2D(GL_TEXTURE_2D,0,0,0,TWidth,THeight,GL_BGRA_INTEGER_EXT,GL_UNSIGNED_BYTE,NULL);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER,0);

My card is GF8800GTS,and driver is 6.14.11.7519

Try calling glBufferDataARB() with a null pointer before the glMapBuffer():
http://www.songho.ca/opengl/gl_pbo.html#unpack

Seems that source and destination pixel size doesnt match, so driver must convert each pixel.
GL_RGBA32UI_EXT mens 32bit per component… ie 128bit per pixels or 16 bytes.
In glTexSubImage2D you are usgin BGRA_INTEGER with unisgned bytes, which is 8bit/copmponent or 32bit/pixel or 4 bytes.

Try to use GL_RGBA8UI_EXT instead of GL_RGBA32UI_EXT.

No,I can’t use 8UI format,because it is 8bit for every component,but V210 format is 10bit for every component,so I must use 32UI,or 16UI.

RGBA taked 4 bytes or 32 bits 8-8-8-8.
v210 also takes 4 bytes or 32bits 2-10-10-10.

My suggestion to you is to treat your v210 image as RGBA and use fragment shader to repack pixels. So…


// 4 yuv10bit DWORDS after unpacking have 6 pixels
vec4i c1 = texture2D(picture, uv);
vec4i c2 = texture2D(picture, uv+vec2(1,0)/texwidth); 
vec4i c3 = texture2D(picture, uv+vec2(2,0)/texwidth);
vec4i c4 = texture2D(picture, uv+vec2(3,0)/texwidth);

vec3f r1 = repack(c1); // Cr0-Y0-Cb0
vec3f r2 = repack(c2); // Y2-Cb1-Y1
vec3f r3 = repack(c3); // Cb2-Y3-Cr1
vec3f r4 = repack(c4); // Y5-Cr2-Y4

// now lets convert to rgb. from 6 ycbcr "samples" we have 6 rgb
vec3f rgb[6] 
vec3f rgb[0] = ycbcr2rgb(r1.g, r1.b, r1.r); // y0-cb0-cr0
vec3f rgb[1] = ycbcr2rgb(r2.b, r1.b, r1.r); // y1-cb0-cr0
vec3f rgb[2] = ycbcr2rgb(r2.r, r2.g, r3.b); // y2-cb1-cr1
vec3f rgb[3] = ycbcr2rgb(r3.g, r2.g, r3.b); // y3-cb1-cr1
vec3f rgb[4] = ycbcr2rgb(r4.b, r3.r, r4.b); // y4-cb2-cr2
vec3f rgb[5] = ycbcr2rgb(r4.r, r3.r, r4.b); // y5-cb2-cr2

// depending on the gl_FragCoord.x we choose one of rgb's
int x = gl_FragCoord.x * width;
x = mod(x, 6)
gl_FragColor = vec4(rgb[x], 1.0);


vec3f repack(vec4i src)
{
 vec3i res(0,0,0);
 res.r = (src.r << 4) | (src.g >> 4);
 res.g = ((src.g & 0x0f) << 6) | (src.b >> 4);
 res.b = ((src.b & 0x3) << 8) | src.a;
 
 return vec3f(res.r/1024.0, res.g/1024.0, res.b/1024.0); 
}

ycbcr2rgb is your specific function so I will not copy code here.
repack function is integer version. If hw doesnt support integers, then use my math forumals in another thread in this forum and write your own float repack function.
Above code is “stright from brain”, so im not sut is it corect… it just shows idea how you could use rgba8 bit texture to deal with 10bit colors. Keep in mind that newer hardware (from SM3.0) have full 32bit FP precission in shaders so expanding integers to floats will not loose bits.

Mr yooyo:
I had understood your suggestion,I replaced the texture internal format as GL_RGBA8UI_EXT,but found the upload performance is poor still,it only catched 90.96M bytes per seconds,only 16 percent of normal texture.

I did some benchies…
Use GL_RGBA_INTEGER_EXT instead of GL_BGRA_INTEGER_EXT. Seems that BGRA is 100x slower than RGBA.

In the case of regular textures BGRA is much faster than RGBA.

Thanks Mr yooyo,I had test GL_RGBA_INTEGER_EXT format and it can catch the maximum performance

Hi everybody,

I know that the subject is a little bit old but I’m trying to do the same thing using float formulas to repack pixels (the graphic card doesn’t support integer texture). But I still have a question about how I should map the texture in the codec since v210 is 4:2:2 format and RGBA is a “4:4:4:4” format.

I mean we have informations for 6 pixels in only 4 pixels. In the previous example, you say we choose on RGB depending on the gl_FragCoord.x. Lets say we have a v210 buffer with a width of 720 and a height of 486. This mean that the row pitch in bytes of the v210 buffer is (720*8/3) = 1920 bytes.

I’m a beginner with GLSL but if I have understood properly, the fragment shader is called for each pixel. If I don’t want to read outside the v210 texture, I should map the texture with a width of 480 and a height of 486 (because 4804 for RGBA = 1920 bytes per line). The problem is that the real width of the image is 720 but I can’t map the texture as 720486 because I’ll probably have a bad access since 720 pixels in RGBA = 720 * 4 bytes/component = 2880.

So I’m not sure about how I should map the texture to show a 720 pixels line while we only need 480 pixels to have the necessary information for 720 pixels. I hope you’ll understand what I mean.

Thanks a lot!

First, I dont understand what you want to achive… to use v210 frames as texture or to export framebuffer to v210?
In first case hw doesnt support v210 native so you have to convert it to RGB. To do that you can write CPU code or faster GPU version.
Format of v210 is Y1-U-Y2-V. To make two rgb pixel from one macro pixel use Y1-U-V and Y2-U-V. So… your source is 720x486. Create RGBA texture of 360x486 (let say yuyv_tex) and fill texture data… each RGBA pixel have one YUYV macro pixel (R=Y1; G=U; B=Y2; A=V).
In GL create FBO (RGB) rendertarget with size 720x486. Now bind yuyv_tex and render screen aligned quad to cover whole FBO. Turn off image filtering (use nearest filter). Because of neasert filtering each texel in yuyv_tex will be stretched exactly 2x horizontally. Now… write simple glsl shader which take sample from yuyv_tex and depending on gl_FragCoord.x (even or odd) use RGA or BGA channels to convert yuv to rgb. If your hardware doesnt support gl_FragCoord then you can try with another texture (grayscale vertical bars 101010101…). Sample pixell from bars… depending on its value (0 or 1) use RGA or BGA… something like:


 float g = texture2DRect(bars, texcoord.xy).r; // we need just one channel
 vec4 rgba = texture2DRect(yuyv_tex, texcoord.xy);
 vec3 finalcol; 
 if (g == 0.0) finalcol = convertYUVtoRGB(rgba.rga);
 else  finalcol = convertYUVtoRGB(rgba.bga);
 gl_FragColor = vec4(finalcol,1.0);

now… your fbo have RGB version of original YUYV texture.

if you want to convert RGB buffer to YUYV use reverse path.
Create 360x486 RGBA FBO, bind yout 720x486 rgb image and render screen aligned quad. In shader, fetch to texels from RGB texture and make one YUYV. To do this use following shader:


// vertex
void main(void)
{
	gl_TexCoord[0] = gl_MultiTexCoord0;
	gl_TexCoord[1] = gl_MultiTexCoord0 + vec4(1,0,0,0);
	gl_Position = ftransform();
}
///////////////////////////

// fragment
sampler2DRect rgbsrc;

static vec3 y_const = vec3( 0.257,  0.504,  0.098);
static vec3 v_const = vec3( 0.439, -0.368, -0.071);
static vec3 u_const = vec3(-0.148, -0.291, +0.439);
 
void main()
{
  // read two rgb pixels
  vec3 rgb1 = texture2DRect(rgbsrc, gl_TexCoord[0].xy);
  vec3 rgb2 = texture2DRect(rgbsrc, gl_TexCoord[1].xy);
  vec3 rgb12 = (rgb1 + rgb2) * 0.5;
  
  vec4 yuyv; // yuyv -> rgba
  
  yuyv.b = dot (rgb1,  y_const) +  16.0/256.0; // Y1
  yuyv.g = dot (rgb12, u_const) + 128.0/256.0; // v
  yuyv.r = dot (rgb2,  y_const) +  16.0/256.0; // Y2
  yuyv.a = dot (rgb12, v_const) + 128.0/256.0; // u
  
  gl_FragColor = yuyv;
}
///////////////////////////

Now… you have YUYV image in your FBO encoded in RGBA.

This looks odd yooyo. Your previous post in this topic seemed about right, but I don’t think v210 has a direct 2:1 relationship with the final pixels, I think you need to do something different for every one of the 6 pixels. I think you may be thinking of 2vuy.

I haven’t crunched the shader myself yet, but it looks to me like (assuming an 8-bit RGBA texture):
Pixel 0 needs sample 0 (y0, b0, r0)
Pixel 1 needs sample 0, 1 (y1, b0, r0)
Pixel 2 needs sample 1, 2 (y2, b1, r1)
Pixel 3 needs sample 1, 2 (y3, b1, r1)
Pixel 4 needs sample 2, 3 (y4, b2, r2)
Pixel 5 needs sample 2, 3 (y5, b2, r2)

And then your math should work but needs to change based on where the components are in the 32-bit word.

Am I missing a clever short-cut? I hope so. The only thing I can think of is to use a 16-bit RGBA texture, and that should save samples.

Bruce

PS I hate v210. v216 rules.

I solved my problem. I use almost the same algorithm than yooyo (see the second yooyo’s post in this topic) except that I do it with float because the graphic card doesn’t support integer textures. In the OpenGL code I did:


glPixelStorei(GL_UNPACK_ROW_LENGTH, rowLength); // Take only one field
glTexImage2D(GL_TEXTURE_RECTANGLE_EXT, 0, GL_RGBA, 720, 486, 0,  m_textureFormat, m_textureType, m_pucBuffer);

where rowLength is 480. Then I use glTexSubImage2D in my drawing function to redraw the quad each video frame. What I was trying to do exactly was to display in real time a v210 video frame in a preview window. To do this I had to convert a v210 texture in RGB. I did it without any FBO but I think I’ll rewrite a version using a FBO because I’m using GL_NEAREAST filter and it’s less beautiful than with GL_LINEAR. I think that if I convert the v210 texture to RGB texture using a FBO (to render to a texture) and then map this newly created texture on a quad using GL_LINEAR filtering, it should be better (the preview windows isn’t full size).

Thanks for your help!

yes… you were right… I was mixed up yuy2 with v210.
(headbang)