let GPU and CPU all busy

pango · April 14, 2004, 8:58am

From some threads in the forum,I know when GPU and CPU can be working at same time,so below code is poor performance:
-some code in CPU;
-glBegin(…);
…
-glEnd();
wglSwapBuffers();

I should place the CPU code between “glEnd()” and “wglSwapBuffers()”,is it?

If I place the above code in some threads,I think when one thread is doing the GPU work,other thread can do the CPU work,it can catch a good performance,is it?

All in all,I want to optimize my program to catch good performance,and from some books and threads,I think the key of optimization is minimizing the time that CPU is waiting for GPU,GPU is waiting for CPU,so I plan to place my render code to diffrent threads,when one threads is waiting the result from GPU,others threads can do the work in CPU.So everyone please tell me,what I think is right?Does my program can catch better performance optimized by what I said?

imported_jwatte · April 14, 2004, 9:19am

I think you’re taking the begin/end too literally. I’d be surprised if the graphics card did anything before the glEnd() except copying your parameters into some buffer.

In fact, I know of some cards which do nothing except copy data (and possibly a bit of CPU-based transform) while you issue commands, and all the actual rendering gets kicked off when you call SwapBuffers, or when you ReadPixels or BindTexture with a texture that you previously CopyTexSubImage-ed with.

In general, for modern cards, you can assume they will buffer an entire frame. Your job is to issue the geometry using the most efficient method possible (NV_VAR or ARB_VBO), and to minimize stalls by not using readbacks, or deferring them until the end of the frame (or the start of the next frame, even). This will allow full CPU/GPU parallelism (although, in reality, you can’t achieve perfect balance).

pango · April 14, 2004, 6:50pm

To Jwatte:
my program has some particularitys,it is a video editing system,using GPU as its rendering device through OpenGL,the workflow of it is below:
1.recevice a video image(in CPU,little time)
2.do image filter job(in CPU,relative long time)
3.input image to GPU(in CPU)
4.do images transition(in GPU,some time I can’t estimate);
5.read transtion result back from GPU(in CPU,a long time)
6.send result image to do some encoding(in CPU,long time)

From such a workflow,if I only create one thread to do,when thread is executing step4(or step5),the all thread must be blocked to wait the rendering result,but in that time CPU has no work to do,it just wait.So I prefer to create some threads to do it,so when one thread is doing the step 4(or 5),other threads can do the steps worked in CPU,so I can let CPU and GPU as busy as possibly.
I’m not a good OpenGL programmer,but I’m familar with multithread tec,so I think out this way to optimize my program,what I think is right?

pango · April 15, 2004, 12:53am

I think over my workflow,I found the maximal slow operation is the reading data from GPU,because I don’t know how to read rc from a seperate thread(if anyone know it,please tell me),so I decide to create one rendering thread, and place rendering & reading code in that thread,below is my pseudo code:
//code 1
rp.makecurrent();//rp is a RTT pbuffer
glBegin();
… //some heavy rendering code
glEnd();
op1(op2).makecurrent();//op1,op2 are pbuffer
rp1.bind();//bind RTT pbuffer as texture
//some code to draw the texture in op1(op2);

//code2
op2(op1).makecurrent();
glReadPixles();

//code3
op1(op2).makecurrent();
glFinish();

In above code,I create op1,op2 two pbuffers just because I want GPU to do the rendering in op1 while CPU is reading from op2,also while CPU is reading from op1,GPU do the rendering in op2.So I insert the code2 between code1 & code3,because I think it can make the rendering and the reading to work at the same time.But above is just my guess,so what I think is right?Please reply your opinion,thanks!

zeckensack · April 15, 2004, 1:48am

sigh
Have another look at your previous thread on the exact same topic . Read it. Again.

In case you can’t be bothered to do that, I’ll just quote myself here for your convenience

The driver must serialize rendering commands from different threads anyway. Using the same rendering context from two different threads is a completely pointless endeavour, even if you can get it to work (which isn’t easy, mind you).
Got it?
And no, it doesn’t get any different if it’s not the same context, I somehow just forgot to spell that out.