I’m trying to use the NV_fence extension on my Geforce 256DDR (Asus V6800) using the 21.81 drivers.
Basically the rendering loop hangs if I do:
Issue drawing commands
// do other interesting stuff
And the loop runs fine if I do:
// no time left to do other interesting stuff
Why is that? I thought the TestFence can be used to query if the drawing by the GPU is finished.
Am I running into some bug or is a fence not supposed to be TestFenced frequently?
BTW, the wlgGetProcAddress stuff and all that operates fine, glGetError returns ‘GL_NO_ERROR’ after each and every openGL call so I am at a loss…
“Am I running into some bug or is a fence not supposed to be TestFenced frequently?”
fence tests are very expensive.
for every fence-test the gpu must be stopped/cpu must wait, until the gpu is able to respond.
that’s what i have read in some of nvidia’s opengl papers…(but can’t remember which one it was)
so be very carefull, when you are using fences.
A better way is to break up your buffer in smaller chunks, and use fances only to check which buffer can be used.(some kind of double/triple buffering)
Thanks for the advise, I will try to keep the number of TestFences as low as possible, but a few per frame shouldn’t hurt.
As for my original question, I’ve solved the TestFence problem: stupid me, I had placed the SetFence after <DUH!> the glFlush, so of course I can wait forever to see that fence finish…
Moved the SetFence before the glFlush and now the test-code runs like a charm, keeping the CPU available for almost 90% on any sort of frame (fillrate, geometry-wise etc.).
Right, this is one of those little tricks that can be non-obvious – since TestFence does not cause a flush, spinning on a fence can become an infinite loop! You can either flush in advance or test once, then if the test fails, flush once and then do future tests without flushes in between.
Don’t test too frequently. Tests suck up bus cycles. But I wouldn’t call a test “very” expensive. In any case, make sure to put some sort of intentional delay (on the order of perhaps a microsecond? maybe less, maybe more) between tests, so you don’t hog the AGP bus. Hogging the AGP bus takes away bus cycles from doing real work, so you’ll actually wait longer!
I’m just learning this at the moment as well so I thought I would include another question. Would it be possible to instead of looping and testing the fence, perform the extra processing and then finish the fence using glFinishFencesNV?
So, instead my structure thought would be:
// Issue rendering commands
glSetFenceNV( fence );
// Perform some extra processing
SwapBuffers( hDC );
This is my current understanding of how to do things, but I have found some very limiting descriptions of fencing so far. I should look at the glFlush operation. I thought that the CPU was supposed to wait for completion from the GPU.
Polling is not necessary for all apps. Some apps can live without it. The reason you might not want to do all your work and then FinishFence is that the hardware might go totally idle if your work takes too long. By issuing more work, you could keep the hardware busy at all times.
Originally posted by witcomb:
Would it be possible to instead of looping and testing the fence, perform the extra processing and then finish the fence using glFinishFencesNV?
Sure, that is possible, however, glFinishFencesNV will stall the CPU just like glFinish does (i.e. it returns only after the fence has finished and you are not able to perform any useful calculations in the meantime).
So if you want to keep control over the amount of time spent doing the ‘extra processing’, then a loop can be more useful.
For example, I try to use all remaining time right up to the next vertical retrace and then issue a SwapBuffers.
Whenever the TestFence shows the GPU isn’t done yet, I can use another refresh-interval of time for the ‘extra processing’, since the SwapBuffers would also stall at least until the next vertical retrace (assuming vsync enabled of course and that’s what I want to have in my app).
So basically the rendering loop looks like:
- Issue all drawing commands for this frame
- determine how much time is left before the next vertical retrace
- loop doing other interesting stuff while time remains
- perform a TestFence
- if the test is false: add a refresh-period to the remaining time and go back to 4
- else issue a swapbuffers
My tests show that the time spent in 8 at the moment is typically less than 5% of a frame. I still have to tune the time between the call to swapbuffers and the actual vertical retrace (currently I issue the SwapBuffers .5 ms before the estimated vertical retrace, but that time can quite likely still be shortened, reducing the 5% to even less I hope)
Would having a worker thread not help here?
The code in the worker thread would continue even while swapbuffers is waiting for a vertical retrace, wouldn’t it?
You can syncronise the 2 threads with a “I’m done” flag.
Or is context switching that expensive that people don’t use seperate threads?
JML, ok, that makes sense. I would be using Vertex Array Range, so I can get in to memory limited situations where I will have to finish the fence, then reclaim the memory to fill it again. So, I would end up looping on issuing rendering, setting the fence, extra processing, and the glFinishFences.
I was just curious how do you determine the amount of time before the next vert-retrace? Determine the monitor refresh rate at 70Hz for example and that would correspond to 1/70 seconds. Then just compute the amount of time from the start of the frame to find the amount of time left??
Knackered, I’m not quiet sure. My intuition says that it would help some but not nearly the same as waiting for the correct time. This is something I will have to try out, since I think that my situation will have threads doing much of my extra processing stage.
I was just curious how do you determine the amount of time before the next vert-retrace
I don’t think win32’s timer functions are going to make this practical…they’re not exactly the most accurate or reliable resources (even queryperformancecounter).
Indeed I estimate the next retrace to occur at 1/monitorfrequency seconds after the return of SwapBuffers. At the moment the frequency is assumed (I’m only testing stuff at the moment), but it can easily be determined by the program by rendering an empty frame and measuring how long ot takes for SwapBuffers to return.
I use QueryPerformanceCounter and QueryPerformanceFrequency to do the calculations and time measurements. AFAI can see this works fine for me at the moment, QPC looks precise enough to aim for time-intervals in the order of .1 ms. That gives me a reasonable granularity within a frame that takes 10 to 17 milliseconds depending on the refreshrate (100Hz - 60 Hz).
Also: I only use relative timing calculations based on the finish of last frame, so inaccuracy of the counter over (very long) time is not relevant for me.
I was just wondering what the performance of the glGenFence and glDeleteFence was like. It would obviously be best not to do these during runtime, however would there be much of a hit for doing this at run-time? If you could direct me to some links that would be great.
You don’t need to call them in run-time. Generate your fences once, and delete them when you are done (end of the program I guess). It’s like textures objects.