Direct GL_YUV, GL_YCbCr or YCoCg texture support ?

For informations, I have begin with a PC 1512 with a “CGA extented” 640x200 16 couleurs, 8 Mhz, 512 Ko RAM, one 5 1/4 floppy drive and no DD.

Now, I have computeurs with Go of ram, Terabytes of DD that have processors at more that 2 Ghz and this handle more that one millions of pixels.

=> the multiplication is about a factor of x1000
==> if we speak in time, it’s like hours in 1980 have now become seconds in 2009 about the computers power … :slight_smile:

Something like the millisecond isn’t really a short time now for computers, their internal work time is more the nanosecond or less now …

@+
Yannoo

I have never can to make a true 3D program that work as speed as I want with the PC 1512 …

With only the quart of this price, I have now an EEEPC 701 that is really more compact, have a lot more of ram, dd and CPU power, and that can easily make some basics 3D / Video traitments in real-time …

@+
Yannoo

I can now handle multiples video textured quads and make a mixing between them in real time with OpenGL with a very little CPU usage on my EEEPC :slight_smile:

This can effectively to be make without any modification to OpenGL, but only with new fonctions that "use and upgrade " the various standards OpenGL glText funcs for to add a new “pseudo-mipmap autogeneration with GL_YUYV support” :slight_smile:

When I see the attitude of lot of persons in this thread that don’t want to see something to be upgraded in OpenGL, I prefer to handle this personnaly and can to have a little more detached view of things …

=> when I want really something, I can generaly to do what I want
(that’s only a question of time and a little work …)

==> but when we really don’t want something, we can’t of course …

@+
Yannoo

I have found on the net a nice fragment shader that make the YUV to RGB conversion (thanks to Peter Bengtsson)

I have a little modified it for to can directly use the 4:2:2 YUV images used by various v4l2 and/or ffmpeg sources and with the use of only one texture unit.

Now, I can map videos texture that come from an video file such as .avi and .mp4 on various 3D object (and so can rotate/resize/compose multiples videos streams into each face of “a lot” of animated and spinned cubes) => my dream since a very long time is now a reality :slight_smile:

This work (very) well on Mac OS and Linux and I work in this time about port this feature into the eeepc word.

For the pleasure, here is the fragment shader :

uniform sampler2D tex;

void main(void)
{
float nx, ny, r, g, b, y, u, v;
float u1,u2,v1,v2;

nx = gl_TexCoord[0].x;
ny = gl_TexCoord[0].y;

y  = texture2D(tex, vec2( (nx),     	(ny)*(4.0/6.0) 	)).r; 
u1 = texture2D(tex, vec2( (nx/2.0), 	(ny+4.0)/6.0 	)).r;
u2 = texture2D(tex, vec2( (nx/2.0)+0.5, (ny+4.0)/6.0	)).r;
v1 = texture2D(tex, vec2( (nx/2.0), 	(ny+5.0)/6.0	)).r;
v2 = texture2D(tex, vec2( (nx/2.0)+0.5, (ny+5.0)/6.0	)).r;

y =  1.1643 * (y - 0.0625);
u = (u1+u2)/2.0 - 0.5;
v = (v1+v2)/2.0 - 0.5;

r = y + 1.5958 * v;
g = y - 0.39173 * u - 0.8129 * v;
b = y + 2.017 * u;

gl_FragColor=vec4(b,g,r,1.0);

}

And the calls for to create/bind/load/update the 4:2:2 YUV stream into a standard GL_LUMINANCE OpenGL texture :

    glGenTextures(1, &texID);                         // Generate the YUV 4:2:2 handle
glBindTexture(GL_TEXTURE_2D, texID);    // and use it
glTexEnvf(GL_TEXTURE_2D, GL_TEXTURE_ENV_MODE, GL_REPLACE);             // note that GL_REPLACE is certainly not the best thing for video mixing ...
    glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);    // Linear Filtering seem a good compromise between speed/quality
    glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);   // this seem the same thing for the magnification and minification

    glBindTexture(GL_TEXTURE_2D, texID);  // update  the YUV video texturing unit  
    glTexImage2D(GL_TEXTURE_2D, 0, GL_LUMINANCE, width, (height*3/2), 0, GL_LUMINANCE, GL_UNSIGNED_BYTE, pictures_queue[pictures_read++]); // with a new frame/picture (generated by libavcodec and/or v4l for example)  

The height3/2 formula is used because the U and V planes are immediately after the Y plane and have only dimension of width/2 and height/2, so the total of the YUV planes is only 1.5x the size of the widthheight grey Y plane.

I work too about using the u2 and v2 texels for to can handle things such as alpha, stencil and multiples audio channels into this YUV texture in a very near future (for to optimize/standardize the memory access that are used by the alpha, stencil and YUV planes/channels and because I think that it is necessary for to can handle “artistics interpolations” between multiples audio/video textures into the fragment shader without sacrify a lot of texture/sound memory/handles) and begin to see about “how to fastly/easily compact all this into a sort of compressed DXT texture”.

=>I propose a new GL_LUMINANCE_CHROMINANCE_ALPHA_STENCIL_AUDIO_VIDEO_MIPMAP_42222_EXT token that can help to combine GL_LUMINANCE_ALPHA/GL_422_EXT with ffmpeg/V4L2 and OpenAL/FluidSynth APIs :slight_smile:

@+
Yannoo

422 texture nicely fits into RGBA or BGRA texture. Just create w/2 x h RGBA texture and upload YUV422 raw data as GL_BGRA. With this you have simpler fragment shader code:


uniform sampler2D tex;
vec4 mp = texture2D(yuyv_tex, coords); // fetch 422 macropixel YUYV
// now... mp.r-g-b-a is actually mp.y1-u-y2-v
// depending on even/odd fragment x position choose y1-u-v or y2-u-v

I’m very sorry because I have make a mistake, it’s the 4:2:0 format that this shader support (cf. the U and V planes are just after the Y plane and they have each a w/2 and h/2 size of the Y plane).

But thanks for the tips, because this conversion seem me effectively very more simple and can easily resolve my problem on the eeepc plateform that don’t seem to have shaders support on Linux (because I have only to convert one YUYV “bloc of two texels” directly on the CPU for to have two RGB(A) texels on the final texture).

Ok the picture size on the 422 format is biggest that the “same” picture on the 420 format but this seem very simple to make and give me the possibility to not have the need to use shaders on the eeepc plateform.

So, this is really a good new but this doesn’t really resolve my real problem that is a true hardware shaders support on the eeepc plateform :frowning:

@+
Yannoo

Im thinking to develop library to handle all those YUV formats. Frontend will. E simple. Provide input yuv image with proper sampling information (422 420 411 etc), define color conversion matrix and run conversion. Result is nice rgb texture. Im planning to support transfer using PBO and 10bit images.

Btw, why you use eeepc? This little machine have miserable Intel integrated graphics. Atm only netbooks with nvidia ion can be used for serious 3e graphics.

Because a big PC, Mac or UN*X workstation cannot be used on a car, bus or metro because this isn’t very portable and in my country we haven’t electrics alimentations into the street :slight_smile:

And I want that this can be used by everybody, so a very cheap eeepc is for me a good choice for that … and I think about the autonomy too (I want that this can work more that one or two hours without an extern alimentation, for to work/play with it in travel or in a desertic country for example) .

And perhaps too that for me, more is hard to make, more I like it :slight_smile:
For example, I think to adapt this for to work with a very slow @café (a very cheap portable at 199 euros in French) and the PocketPC and iPhone plateforms.

I’m very interresting too by a small API (but fast and multi-plateforms, it’s principally why I like particulary the OpenGL API) that can make a very fast conversion between various YUV formats and RGB(A) :slight_smile:

And my problems with vertex/fragment shaders seem to be resolved on the eeepc plateform, so I really think that the eeepc can be really more speed in 2D/3D animation with the 3D hardware accelleration than a big computer without …

@+
Yannoo

The fragment shader way work very nicely :slight_smile:

I have make a video player around it that use avcodec and OpenGL and that can display in real time AVI/MPEG/V4L videos streams on various 3D surfaces, and this with less that 10% power of the CPU on my iMac.

This work fine from YUV 4:2:0 planar frames that come from libavcodec or various V4L devices streams for example (cf. one Y widthheight plane, followed by one (width/2)(height/2) U plane and a (width/2)*(height/2) V plane).

I search now about something for to efficiently mapping/tunnelling IPBBPBBPBB… GOPs (group of pictures) into GPU textures units (cf. successives frames in a MPEG file become successives textures units in the GPU).

I read/decode now each frame one by one but begin to think about to read/decode an entire GOP in one block (this is certainly very more faster).

Something that permit the GPU to load an entire GOP from a video streams into buffers, buffers that it can directly reuse as input into GPU textures units for to permit a lot of wonderfull inter-frames special effect such as fade-in and fade-out and support multiples video stream with the less % CPU possible (cf. the GPU make the more intensives computations)

I have now this :

- compressed MPEG/AVI or a raw video frame input from a webcam

- decompression of a frame in system memory

 - load this frame into the texture unit with glTexImage

 - make the YUV to RGB transformation at the final fragment shader level

And I want

- compressed MPEG/AVI/RAW input

- fast decompression of a entire GOP into the system/video memory

- load this GOP (or a part of it) into textures units
   (where pictures can be recompressed into DXT/ST3C/JPEG formats for to limit the video memory used ) 

- the YUV to RGBA conversion is always make at the fragment shader level
   (but with one frame per texture unit, so I think this is certainly limited to I and P frames on commons video card)   

I look too for to handle another more efficient GOP streaming from libavcodec/v4l to OpenGL texture units, and permit very fast (but acurate …) forward/forward playing and seeking on video streams with the help of the IPBBPBBPBB… IPBB informations that we have in GOPs.

@+
Yannoo

Current API (OpenGL or D3D) still doesnt expose enough functionality to decode frame on GPU. GL and D3D is not designed for that. Shaders require a lot of features in order to decode stream.
You can do this using CUDA but this is limited to NVidia only. CUDA video is realy fast. My NV 8800GT can decode and display 720p in 150-170fps.

Also you can get some of GPU hw decoding functionality using DXVA (on windows) or VDPAU on Linux.
http://http.download.nvidia.com/XFree86/vdpau/doxygen/html/index.html
http://www.phoronix.com/scan.php?page=article&item=xorg_vdpau_vaapi&num=1

You can use hybrid solution…What you could do is to use PBO. Create several PBO buffers (2-4), map them all an mark as ‘avaible’. Decoding thread pick one of ‘avaible’ PBO buffers, decode frame directly to that buffer and map it as ‘ready’. Renderer thread in loop search for ‘ready’ buffers, unmap them, upload its content to texture and render. Later, this buffer can be mapped again and reused.
If you provide enough PBO buffers you can stream and display multiply video streams at same time.

Now… a month ago I introuced that I could develop conversion library. I did spent some time on it by designing interface… So far it looks like:


// initialise Video4GL library
// initialise gl extensions and register builtin conversion classes
GLint v4glInit();

// delete all stuff
GLint v4glShutdown();

enum v4glOutputPixelformat
{
	V4GL_RGB,
	V4GL_RGBA,
};

// convertor stuff
GLuint v4glCreateConvertor(unsigned int in_fourcc, v4glOutputPixelformat out_format); // input fourcc, output pixelformat. return convertor handle
void   v4glSetColorConverstion(GLuint conv, GLfloat* matrix); // input: converter and matrix. set conversion matrix or use default 
void   v4glSetColorConverstionAlpha(GLuint conv, GLfloat alpha); // input: converter and alpha channel... only for RGBA output when input doesnt have alpha. default alpha 1.0f
void   v4glDeleteConvertor(GLuint conv);
void   v4glProcessPendingConversions(GLuint conv);


// image stuff
GLuint v4glCreateImage(GLuint conv, GLuint width, GLuint height); // creates image. return image handle
void   v4glSetImageData(GLuint image, void* data); // fill image data (from decoded stream)
GLuint v4glGetOutputTexture(GLuint image); // returns rgb texture id
void   v4glDeleteImage(GLuint image); // delete image


and usage example


// initialise gl extensions and register builtin conversion classes
v4glInit(); 

// create convertor YUY2 to RGB
GLuint conv = v4glCreateConvertor(FOURCC(YUY2), V4GL_RGB); 
// and two streams HD1080 and PAL
GLuint img1 = v4glCreateImage(conv, 1920, 1080);  
GLuint img2 = v4glCreateImage(conv, 720, 576);

...
// fill YUY2 data. It will mark internal flag need_processing 
v4glSetImageData(img1, pointer1);
v4glSetImageData(img2, pointer2);

// this call process all pending processing related to specific converter
v4glProcessPendingConversions(conv);

// now is safe to use RGB versions. This is opengl texture id's
GLuint rgbimg1 = v4glGetOutputTexture(img1);
GLuint rgbimg2 = v4glGetOutputTexture(img2);

// render using rgbimg1 and rgbimg2

...
// this call will delete and invalidate all images
v4glDeleteConvertor(conv);

// at exit shutdown library
v4glShutdown();

Each fourcc format require conversion class and specific shaders. Most of fourcc formats can share shader code and sampling so with carefull planing it is possible to write generic converter to handle most of fourcc formats.

I have resolved a lots of problems about multi-pictures and GOP support :slight_smile:
=> I use now a 3D texture where slices are pictures in the GOP
==> so I can now access alls pictures in a GOP, but with only the use of one texture unit (cf. mixing between multiples video textures and something such as “temporal mip-mapping” into the fragment shader are possibles)

I have now to see with avcodec devellopers how can I have the more speedly an entire decompressed GOP in memory (for the instant, I decompress picture by picture and make a GOP after 8, 12 or 16 pictures decompressed) and I think too to add rapidly an S3TC or DXT compression for to limite the size of “decompressed” pictures in VRAM (and/or to see how to share efficiently the data between the video card and the system memories or how working with very fast VLB/AGP/PCI memory transferts)

Yoyoo, your v4gl API seem good, can I have access to this ?

(It can certainly to be interresting to include this work into v4l_convert* fonctions provided by the V4L library for example)

How can I find a good tutorial that explain precisely how we can use pictures decompressed with VDPAU (I only find things that seems directly use the VDPAU API for displaying, but not how to get pictures outside the API) ?

@+
Yannoo

one texture unit (cf. mixing between multiples video textures into the fragment shader is possible)

A shader CAN access to multiple textures, 32 on my gtx275 card, even if max multitexture is only 8.

Yes, it’s true.

But with GOPs of more that 16 pictures, I cannot to have 2 GOPs already loaded into texture units if I use one texture unit per picture.

And I prefer not to loose the multiples textures units layered power … for to can handle a mix/fade-in/fade-out/incrustations between 32 video streams for example :slight_smile:

Plus the fact that we have only to use one texture unit per video stream seem me more natural and simple for to be the more user-friendly possible (and this certainly permit to a compagny that make the hardware to implement/map this very easily).

@+
Yannoo

I see now, and your post is clearer.

==> so I can now access alls pictures in a GOP, but with only the use of one texture unit (cf. mixing between multiples video textures and something such as “temporal mip-mapping” into the fragment shader are possibles)

Is this ?

I complete/correct sometime my old post for to be “more precise”.

This is never for delete something but only for to correct or test make a better traduction.

@+
Yannoo

Each GOP is transformed into a 3D texture composed of slices of width*height consecutives pictures.

=> we can map the frame at time timestamp to a 2D quad from (x0,y0) to (x1,y1) in the z depth using :

    glBegin(GL_QUADS);
      glTexCoord3f(0,0,timestamp);    glVertex3f(x0,y0,z);
      glTexCoord3f(1,0,timestamp);    glVertex3f(x1,y0,z);
      glTexCoord3f(0,1,timestamp);    glVertex3f(x0,y1,z);
      glTexCoord3f(1,1,timestamp);    glVertex3f(x1,y1,z);
    glEnd();

where the s,t positions into glTexCoord3f(s,t, timestamp) calls and the (x0,y0) and (x1,y1) positions can be adapted for handle the screen format (4/3, 16/9, etc …) and an automatic zoom in/out

and timestamp is used for to index the slice into the 3D texture (that is an array of 2D pictures that form a video)

=> we have only to increment the timestamp variable (and/or z) in this short code for to display in 3D consecutives frames (and/or slices) from the GOP

With this vertex shader :

void main()
{
gl_FrontColor = gl_Color;
gl_TexCoord[0] = gl_MultiTexCoord0;
gl_Position = ftransform();
}

and this YUV 420 fragment shader :

uniform sampler2D tex;

void main(void)
{
float nx, ny, r, g, b, y, u, v;
float u1,u2,v1,v2;

    nx = gl_TexCoord[0].x;
    ny = gl_TexCoord[0].y;

    y  = texture2D(tex, vec2( (nx),         (ny)*(4.0/6.0)  )).r; 
    u1 = texture2D(tex, vec2( (nx/2.0),     (ny+4.0)/6.0    )).r;
    u2 = texture2D(tex, vec2( (nx/2.0)+0.5, (ny+4.0)/6.0    )).r;
    v1 = texture2D(tex, vec2( (nx/2.0),     (ny+5.0)/6.0    )).r;
    v2 = texture2D(tex, vec2( (nx/2.0)+0.5, (ny+5.0)/6.0    )).r;

    y =  1.1643 * (y - 0.0625);
    u = (u1+u2)/2.0 - 0.5;
    v = (v1+v2)/2.0 - 0.5;

    r = y + 1.5958 * v;
    g = y - 0.39173 * u - 0.8129 * v;
    b = y + 2.017 * u;

    gl_FragColor=vec4(b,g,r,1.0);

}

I can map videos on the surface of a lot of various 3D animated objects.

This versions of shaders doesn’t use the 3D texture approach, they only map one YUV420 picture into the RGBA domain color of OpenGL textures and display (I work on it and don’t have something that work well for this instant, but this is only a question of hours or days)

The OpenGL output is really very impressive, but the video input seem to me “relatively slow”, so I want something for to boost it

One other side, GOPs can be “simuled” by glTextGenTextures use too …

ZBuffer, how you make quotes ?

@+
Yannoo

http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=faq

and open What UBBCode can I use in my posts?

Regarding v4gl… its a just a draft… I still need to work to be sure that this interface is good enough.

Thanks Yooyo :slight_smile:

I have various problems with incompatibilies between V4L(2) API and hardware versions :frowning:

But this exactely the same thing with ffmpeg/avcodec … :frowning:

They aren’t mature APIs but they are really very nice …
(but too complicates for to use on differents OS/computers I find)

But they are here and it’s a really good thing for all the world … :slight_smile:

@+
Yannoo

For to be simple, I want too something such as :

GLuint glTexGenVideoExt(int *numframes, void *GOP);

That return an texture index to numframes consecutives frames decompressed from IPBB packets into numframes consecutives texture handles.

and a glBindVideoExt(int video_texture) that bind the current frame into the video_texture array and increment an internal counter for the next glBindVideoExt call.

On other side, glGenTextures/glDeleteTexture/glBindTexture seems to make a very good job about texture memory uses and allocations

=> so this can be very more simple that I think :slight_smile:

And after reflexions, I think that the work is more from the avcodec/ffmpeg/v4l audio/videos and frames decompressions techniques/optimisations point of view that from an more generic OpenGL point of view where textures access are really very very fast.

But I really think that the various MPEG1,2,4 hardware decompression engines in recents videocards can easily handle the GOP decompression to successives OpenGL textures, no ?

@+
Yannoo

Yoyoo, what do you think about an YUYV et YCbCr addition on this little API ?

enum v4glOutputPixelformat
{
V4GL_RGB,
V4GL_RGBA,
V4GL_YUYV
V4GL_YCbCr
};

  • the 4:2:0 format support
    (this is already make into the fragment shader, but this can to be extended/discarded for specifics shaders)

But what about the time support for the video texturing support ?

@+
Yannoo