To Map() or not to Map()?

yooyo · July 12, 2005, 4:45am

All GL stuff should stick in one thread. Maping buffers and passing poniters to another thread may not be safe because of driver changes.

Much better is to use async data transfer using PBO or VBO. When you desing app that have to do some streaming job on GPU, you must sync GPU and CPU to achive best performances.

It is OK to use thread, but one (render) thread should be dedicated to OpenGL and all other should serve data to this render thread.

On latest hw (P4 3.06GHZ/HT + PCI-X Nvidia 6600GT) I can play up to 6 full PAL (720x576) MPEG2 videos and CPU usage is ~90%. It is ~9.4MB per frame. One rendering thread and 6 DirectShow filtergrpahs with bunch of filters inside (and each filter runs in it’s own thread)

yooyo

andras · July 12, 2005, 5:13am

a)you’re working on a HyperThreaded P4 and
Nope, I’m developing on an Athlon… There is no HT…

[quote]So actually, our idle thread uses 90% of the CPU, it’s kinda funny…
Please don’t tell me you’ve written your own idle thread :eek: [/QUOTE]I meant our idle priority worker thread… It is the thread that does all the computing, but it’s really much more important to run the UI thread instead. The worker thread can wait until there’s nothing else to do.
Anyway, I’ll try without changing priorities and see what happens (as soon as I get my stuff compiling again :-P)

andras · July 12, 2005, 5:22am

All GL stuff should stick in one thread. Maping buffers and passing poniters to another thread may not be safe because of driver changes.
This is not true. Read the specs :
“The expectation is that an application might map a buffer and start filling it in a different thread, but continue to render in its main thread (using a different buffer or no buffer at all).”

tamlin · July 12, 2005, 8:06am

yooyo,
That’s just horrible. That’s a horribly slow performance given your h/w. Is that for real, or did you miss an order of magnitude somewhere?

yooyo · July 12, 2005, 9:05am

That’s just horrible. That’s a horribly slow performance given your h/w. Is that for real, or did you miss an order of magnitude somewhere?
Horrible? Why? I say 6 MPEG2 streams (full PAL resolution). And each one is decoded on CPU side. Each stream take ~ 10-15% CPU time just for decoding.

Maybe you didn’t understand me. Using PBO it is possible to stream ~1.8GB/sec. In my case 6 frames is ~9.4 MB/frame.

yooyo

tamlin · July 12, 2005, 9:35am

yooyo,
I’m sorry. I didn’t realize you saturated the CPU. My bad.

andras · July 13, 2005, 5:40am

Hey, I thought I’d re-post this, as it seems like it got lost in the noise (either that, or nobody knows the answer ) Anyway, here goes again, hope I’ll have more luck with it this time

Originally posted by andras:
So, I was re-reading this nVidia document on VBO usage, and there is a section called “Avoid calling glVertexPointer() more than once per VBO”, where they say, that all the actual setup happens on glVertexPointer call! Now how exactly does this work in the shader era, when we have to bind attribute arrays to locations? For example I have lots of different shaders, and each shader have multiple attributes, and different attributes are stored in different VBOs. So for each attribute, I have to bind the corresponding VBO, and then call VertexAttribPointer(location…) to attach the buffer to a location. And I’ll have to do this every time I change shaders, right? And of course every time I request new memory with glBufferData(NULL)! Or am I missing something? I have to admit that I feel a bit lost here. If someone could shed some light on how this works it would be really appreciated! Thanks!

yooyo · July 13, 2005, 8:43am

@andras:

In one of your VBO you have stored vertex positions. When you bind that VBO, just call glVertexPointer once. And this is related to all other glXXXPointer functions too. So…

activate VBO
just setup pointers
setup vertex pointer as last
use glEnableClientState / glDisableClientState / glEnableVertexAttribArrayARB / glDisableVertexAttribArrayARB calls to enable or disable attributes (this is cheap)
draw your geometry using glDrawElements(Arrays) / glDrawRangeElements / glMultiDrawElements(Arrays).
in GLSL, use glBindAttribLocationARB before linking and unify your attributes locations according to your VBO config.

If im wrong, let somebody correct me.

yooyo

andras · July 13, 2005, 11:22am

Well, this is what I’m doing with the exception that I never call glVertexPointer as I only use custom vertex attributes (yes, even for position)… But I guess this should be ok.

Dirk · July 14, 2005, 4:25pm

Originally posted by andras:
This is not true. Read the specs :
“The expectation is that an application might map a buffer and start filling it in a different thread, but continue to render in its main thread (using a different buffer or no buffer at all).”
Hey, that’s a good find! Now we just need the driver writers to confirm that this is actually supported and I can start redesigning my stuff!

nVidia, ATI: Do your drivers support this behavior? On Linux, too?

T101 · July 14, 2005, 10:54pm

Maybe I misunderstand here (I haven’t used VBOs yet), but a VBO is just filled by using ordinary assignments, right? Not by gl functions.

So if a buffer is just CPU-accessible, and provided that you don’t use the buffer for rendering while you’re still filling it (something that is up to your own program, typically prevented by means of a mutex), then there is absolutely no reason why this shouldn’t work on all operating systems capable of multithreading.

After all, that’s the difference between a thread and a process: a process has its own memory, open files etc. All threads of the same process share those resources. (And a GL context can only be active in one process)

Dirk · July 17, 2005, 5:12pm

Originally posted by T101:
Maybe I misunderstand here (I haven’t used VBOs yet), but a VBO is just filled by using ordinary assignments, right? Not by gl functions.

Yup.

So if a buffer is just CPU-accessible, and provided that you don’t use the buffer for rendering while you’re still filling it (something that is up to your own program, typically prevented by means of a mutex), then there is absolutely no reason why this shouldn’t work on all operating systems capable of multithreading.

That’s not something I would bet the house on that I don’t own. Given that VBOs can live in AGP or graphics card memory, I would really like to get some confirmation that these are consistent and accessible across all threads.

After all, that’s the difference between a thread and a process: a process has its own memory, open files etc. All threads of the same process share those resources. (And a GL context can only be active in one process)
I’m pretty sure that a GL context can only be active in one thread. I’ve written programs that use multiple threads, where each thread has a different OpenGL context, so it’s certainly not one context per process.

Any of the driver guys want to comment? Even saying “We don’t know yet, probably not” would be helpful (“We do know, and yes” would be more helpful, but hey, I take what I can get ).

andras · July 17, 2005, 5:26pm

Life is dangerous, you have to take some risks! Live on the edge!! Go ahead and use it!

Won · July 18, 2005, 6:15am

But a mapped VBO is just a pointer, and hence can be shared across threads. You would need some way for the memory thread (which doesn’t need its own rendering context) to communicate to the rendering thread that it is done (CPU-CPU synchronization is your own problem) so the render thread can call unmap.

This could potentially speed things up (or make things more interactive, at least) if you need to process/decompress data, assuming that switching thread contexts isn’t too slow. I have not confirmed this, but I heard that NV contexts implicitly flush on thread switches.

-W

Dirk · July 18, 2005, 6:44pm

Originally posted by andras:
Life is dangerous, you have to take some risks! Live on the edge!! Go ahead and use it!
With the current state of drivers I’m always on the edge and take all the risks I can handle.

I’m currently redesigning the Geometry part of my Open Source scenegraph ( OpenSG ), and if that doesn’t work reliably for all my users my life will be pretty miserable. I’d really like to be sure before committing to it…

Dirk · July 18, 2005, 6:49pm

Originally posted by Won:
But a mapped VBO is just a pointer, and hence can be shared across threads.
Given that the pointer can point to AGP or graphics card memory I’m not so sure about this.

You would need some way for the memory thread (which doesn’t need its own rendering context) to communicate to the rendering thread that it is done (CPU-CPU synchronization is your own problem) so the render thread can call unmap.
Yeah, we have the CPU part pretty well and flexibly covered, it’s the graphics that I’m working on.

This could potentially speed things up (or make things more interactive, at least) if you need to process/decompress data, assuming that switching thread contexts isn’t too slow. I have not confirmed this, but I heard that NV contexts implicitly flush on thread switches.
Hm, what about true multi-processor (or multi-core) systems? There is no thread switch here, multiple threads are physically running at the same time. So there is no way to flush anything.

andras · July 25, 2005, 6:00am

[b]

[quote]Our main thread doesn’t use a lot of CPU (we make the GPU sweat instead ;P), but it has to be super responsive!
And you’re assuming assigning it a high priority will make your thread “super responsive”?
Well, yes, in some twisted way it will do that. A high priority thread will starve all other threads, it will basically run all the time unless it Waits on some object or its message queue.
But then again, if a thread waits on an object, it will be resumed immediately anyway, as long as it has the same priority as all other currently ready threads.

So there’s your responsiveness. Giving a thread high priority will not increase its responsiveness. It will instead make all lower-priority threads unresponsive.[/b][/QUOTE]Sorry, it took me a long time to test, but finally I could try setting the priority back to normal and this causes the framerate to become much more uneven, there are sudden spikes, which makes the overall feeling pretty bad. If I set this thread to idle, there is no noticeable performace loss, but everything becomes a lot smoother. So do you have any idea why is that then, or how could I make it smooth with normal priority??