PBuffer performance problems


I’ve just tried using PBuffers because I need to render a texture. But unfortunately, changing the rendering context is extremely slow. I tried to use a shared rendering context for framebuffer and pbuffer but calling wglMakeCurrent still isn’t faster if only the device context is changed. The performance hit for some wglMakeCurrent calls ist several percent, definately too much…

So I was thinking about alternative ways to render a texture. Would it be possible to make the framebuffer twice as big as the normal resolution and use the second half as a temporary buffer? Is that an option or are there better ways?

Thanks in advance

If you’re using nvidia drivers, then I believe render-to-texture is slower than copysubimage…don’t know why, but it just is. I still use copysubimage anyway, as its widely supported, unlike arb render-to-texture.
You don’t need to increase the size of your backbuffer. Render the stuff you want to end up in a texture into the backbuffer, then glcopysubimage it into your PRECREATED texture, then clear the backbuffer (if you don’t render with 100% coverage) and render your scene as normal.

Thanks. Yes, rendering the texture before the rest of the scene is of course possible. But the problem is that I need to render shadow textures for several lightsources. It would take pretty much memory to prerender all of them. That’s why I wanted to overwrite the shadow texture for every light source.

I think that you could avoid the context switch by having 2 threads. Each thread can have its own rendering context.

The pbuffer thread would only have to wake up, render to the pbuffer, then go to sleep again.

The main thread would wake up the pbuffer thread then wait for it to go back to sleep.

I am sure that thread context switches are cheaper than rendering context switches.

I’ve had problems making GL calls from multiple threads with the same context… although this solution certainly sounds interesting and would speed render-to-texture up significantly. Has anyone ever tried that?

The solution I was describing was multiple threads and multiple contexts. The documentation clearly says that multiple threads with a single context is not supposed to work. A context can only be current for one thread at a time.

[This message has been edited by Nakoruru (edited 08-30-2002).]

No, I would think that would be slower. OpenGL is a state machine, the renderer context stores the current state, as a shadow of the drivers/hardware internal states. By switching from one renderer context to another, you are effectively setting all the states again to new values for the new context. This will also be true for thread contexts, the states in the renderer context in each thread would have to be changed when the processor switches thread contexts, I would have thought. So instead of just changing the renderer context, you’re also adding the burden of the threads context states too. So, I would have thought it would be slower to use 2 threads with 2 RC’s. Unless you had two renderer pipes (2 graphics cards, essentially).
Or maybe I’ve misunderstood things.


You could be right, you could be wrong. I think this is a case where someone needs to try it to find out.

I have mega doubts that a full blown rendering context switch takes place automatically when switching threads. My bet is that the Microsoft ICD mechanism knows which thread it is executing for and automatically uses the context for that thread without having to switch much of anything.

[This message has been edited by Nakoruru (edited 08-30-2002).]

Mmm, so your geforce thinks its got back face culling enabled, and then your cpu switches to another thread and therefore another rendering context which has culling disabled - how is the geforce supposed to know that the new render context has back face culling disabled, if no states are changed?

Originally posted by Nakoruru:
I have mega doubts that a full blown rendering context switch takes place automatically when switching threads.

I’m pretty sure it does. The problem lies not in the software, but rather in the hardware. The hardware can only be in one state at a time, and as soon as context switch is made the hardware must be updated to reflect the stored state in the software context, regardless of what thread it’s related to. This can be optimized on a driver level though to keep track on the actual hardware state and only update what’s needed. In my experince context switches aren’t that expensive on ATi cards at least, don’t know about others.

Oh yeah, I was thinking in software terms when I should have been thinking about the hardware. I am obviously wrong.

I guess that the ICD switches contexts automatically if you call into it from a different thread with a different context than the one you last called it from. This would keep it from switching every single time slice if you had, for example, two different programs open with 2 different opengl windows.

The only reason that 2 threads could be faster is if context switching is I/O bound and asyncronous (meaning that the OS blocks the caller and then runs other programs until the I/O call is ready). If it is, then you could do other things on the CPU while the context switch happens. If it is CPU bound then it will be slower. It might be different ways on different cards.

Other than that, I have no idea, sorry.

I’ve seen some opengl apps run on Oxygen systems. Opening like 15 times the same program with animation doesn’t seem to bring the performance down by much. PCs are no match.

I have noticed myself that context switching increses the CPU usage.


Thanks for your comments!
About my first idea with the double-sized framebuffer. The problem is that the max viewport size is dependant on the window size, right? Is there a way around this?

Are you sure you need a double sized window for this? Why don’t you just use the backbuffer for your texture and just clear and rerender for the front buffer. It would be like a double purpose backbuffer.

The benifits of pbuffer is that it’s contents are not under risk and you can keep them hanging around. Also, it’s better than rendering to a bitmap.

//edit// just reread knackereds post, so I guess you don’t want this. Another option is to have another window hanging around. Make it invisible and see if things work. Never tried that trick.


[This message has been edited by V-man (edited 09-01-2002).]

Originally posted by V-man:
[b]//edit// just reread knackereds post, so I guess you don’t want this. Another option is to have another window hanging around. Make it invisible and see if things work. Never tried that trick.


As a general rule, if a portion of a window is occluded, it’s not safe to assume you can render to that portion and read the results back (the OpenGL driver will more than probably clip the non visible portion out). This also goes for non-visible windows.
Obviously this depends on the OS and the specific driver.
The reason for this is that in some OpenGL implementations the backbuffer and localbuffer (Z & stencil) are unique for all the windows in the desktop (it’s a desktop-sized buffer), so overlapped regions don’t have a valid area in those buffers. I think that apple for example has a different backbuffer/localbuffer for each window (same as D3D in MS Windows).