Using Multiple GL render contexts in an Win Apps

mihai · November 4, 2008, 2:32pm

Hy,
I’m pretty sure others asked the same, but I have to meet a tight deadline.
I have a surveillance application, that receives frames from several DVR’s in a private network, compressed with DivX, and the applications should be able to display them in almost real-time. Until we tryed to port the application to Vista, everything worked fine, with DirectX 7 Direct Draw. Aparrently DirectX 9Ex, or 10, does not support those functions anymore. So we tryed texture’s in DirectX9, but even with 2 cams, the delays are unacceptable, and it should work with 16 cams in the same time.
So we switched to OpenGL. Same hypothesis’s, several DVR’s in a private network, and this application should be able to display those frames in real time. Even at a low frame rate around 3fps, on every channel, there’s a delay on drawing. We’ve been using glDrawPixels(), to draw the decompressed frame on the render context. Before the actual drawing, we used gluScaleImage(), for bitmap resize.
My question is:
There’s is a faster way to draw bitmaps, on a render context, when I’m using atleast 16 different contexts, and offcourse Windows???

Thanks,
Mihai

ZbuffeR · November 4, 2008, 3:03pm

use textures instead of pixels, bilinear scaling will be ultra fast.

gluScaleImage is expected to be slow.

you can use 1 texture per video stream, and change it with glTexSubImage whenever you receive a new frame.

yooyo · November 4, 2008, 5:37pm

Create main render loop…


while (!m_bQuit)
{
  ProcessVideoCamRefresh();
  Update();
  Render();
  SwapBuffers(hDC);
  Sleep(20);
}

// todo: make CImage class which 
// have basic members... width, height, 
// bytes_per_pixel and pointer to raw data

// Create pool of image objects (reused from different decoders)
CCritSect ImgPoolCS;
std::vetor<CImage> ImgPool;

// create queue
std::queue<CImage> videoqueue;

// critical section
CCritSect videoqueueCS;

// create 16 textures
#define NUM_TEX 16
GLuint textures[NUM_TEX];

void Init()
{
 glGenTextures(NUM_TEX, textures);

 // assume all streams have equal size
 for (int i=0; i<NUM_TEX; i++)
 {
   glBindTextrure(GL_TEXTURE_RECTANGLE_ARB, textures[i]);
   glTexImage2D(GL_TEXTURE_RECTANGLE_ARB, ...., NULL);
 }
 glBindTextrure(GL_TEXTURE_RECTANGLE_ARB, 0);
}

void ProcessVideoCamRefresh()
{
  videoqueueCS.Lock();
  while (!videoqueue.empty())
  {
    CImage* pImage = videoqueue.top();
    videqueue.pop();
    videoqueueCS.Unlock();
    UploadImage(pImage);
    videoqueueCS.Lock();
  }
  videoqueueCS.Unlock();
}

void UploadImage(CImage* pImage)
{
  // I assume rectangle texture (not power of two)
  // anyway.. you can try GL_TEXTURE_2D
  glBindTexture(GL_TEXTURE_RECTANGLE_ARB, textures[pImage->cameraID]);
  glTexSubImage2D(GL_TEXTURE_RECTANGLE_ARB, ,..., ...);
  glBindTexture(GL_TEXTURE_RECTANGLE_ARB, 0);

  free(pImage->ptr);
  pImage->ptr = NULL;
  pImage->bUsed = false;
}

// called from decoder thread
void AddJob(Cimage* pImage)
{
  videoqueueCS.Lock();
  videoqueue.push(pImage);
  videoqueueCS.Unlock();
}

// in decoder thread.. assume you are using DirectShow
HRESULT DoRendersample(IMediaSample* pSample)
{
  BYTE  *pBmpBuffer;     // Bitmap buffer, texture buffer
  pSample->GetPointer( &pBmpBuffer );
  long size = pSample->GetSize();

  CImage* image = FindUnusedImageObject();
  if (image != NULL)
  {
    image->bUsed = true;
    image->ptr = duplicate(pBmpBuffer, size); // copy sample
    image->xres = m_Width;
    image->yres = m_Height;
    image->bpp = m_Bytes_per_pixel;
    image->format = IMG_RGB;
    image->camera_id = m_CameraID;
 
    AddJob(image);
  }

 return S_OK;
}

CImage* FindUnusedImageObject()
{
  ImgPoolCS.Lock();
  // search ImagePool for first CImage with bUsed == false. 
  ImgPoolCS.Unlock();
}

So… idea is to have only one GL context and one render loop which render screen every 20 ms. During that time other thread(s) decode video and call AddJob to add decoded images to renderer… Renderer once per frame check videoqueue for incoming images, upload new frames and render all images in a grid.

There is a two possible bottlenecks…First is DivX decoding (which must be done on CPU side) and uploading (transfering from sysmem to texture).
On modern PC you can easly decode many DivX streams using some good decoder (like ffdshow).

Uploding iages is a bit differernt story. It’s performances depends on hardware, driver and your software. In hardware terms, it work better on PCI express mainboards with newer graphics board (from NVidia or ATI). Intel integrated graphics card is slow for this kind of usage. Another advantage of NVida or ATI hw is that drivers support async data transfer which is ideal for video streaming and processing. See http://www.opengl.org/registry/specs/ARB/pixel_buffer_object.txt

OpenGL usually doesnt “like” YUYV pixels, so your decoder must deliver RGB or RGBA images. There is a extension which can handle YUY2 or if you have time you can make your own hw accelerated YUV to RGB conversion using shaders.

Anyway… your task is trivial for expirienced OpenGL developer.

mihai · November 6, 2008, 3:30pm

thanks all for advices, I successfully completed my task.

Mihai,

system · October 19, 2021, 6:24pm

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.