How fast can I update a texture to openGL that has the size of 1024*1024. Are there any limitations? Is there a smart function for this??
Make sure to use glSubTexImage to update a texture instead of glTexImage. Despite its name, nothing keeps you from changing the entire image. It is more efficient than creating/destroying a texture.
There are no limitations AFAIK.You can subload the texture every frame but that’s not going to be too fast,not even fast enough probably.
Is there any way to DMA a texture so that it can be animated more efficiently? You can use vertex_buffer_object for geometry data, but what about texture data?
I’ve never used PDR for anything other than readpixels but the spec implies you can use it to speed up things like gltexsubimage2d. It is the pixel equivilent of VBO(or rather VAR). PDR is NV specific but there is talk of ARB PBO (ARB version of PDR). It may be some time before we see this though.
Heres the PDR spec http://www.nvidia.com/dev_content/nvopenglspecs/GL_NV_pixel_data_range.txt
You might find this post useful http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/004092.html
You probably also want to be uploading one frame before using. Thus you’d be double-buffering your uploads. This may give the hardware time to parallelize uploads before it’s used. However, there would still be at least one system memory copy unless you use pixel_data_range (although that might be into AGP memory, if your drivers are studly).
The AGP bus bandwidth is 512 MB/s for AGP 2x, 1 GB/s for AGP 4x and 2 GB/s for AGP 8x.
Main RAM bandwidth is 512 MB/s for PC66 memory, 1 GB/s for PC133 memory, and 2 GB/s for DDR266 memory.
Note: if you use all your memory bandwidth for texture uploads, there’s nothing left for geometry or the CPU.
Unfortunately, I can’t get GL_NV_pixel_data_range to have any effect at all on a GeForce4 or on a GeForceFX 5900 Ultra. The spec is pretty simple, but I suppose it’s still possible that I’m doing something wrong. I was unable to find sample code, so I had to work straight from the spec which is always harder…
My memory is PC133 and my bus is AGP 4X, so my theoretical speed should be about 1GB/s. In reality, I’m able to get about 77MB/s doing texture subloads and 123MB/s doing glDrawPixels. My test case uses 640x480 frames. These data rates are calculated from average frame times of 7.5 ms and 12 ms.
So you don’t get asynchronous behaviour with PDR? You should do. I found the speed gain with PDR was small, only about 10% for readpixels. The biggest gain is from the asyncrhonous behaviour.
Nope, I didn’t see any speed gain. All I did for setting it up is this:
glPixelDataRangeNV(GL_WRITE_PIXEL_DATA_RANGE_NV, size, myData);
Then I tried subloading myData and sending myData to glDrawPixels. But I had no speed gain. I also tried some variations on the above, such as turning off PDR when it was unused, but nothing helped
I also tried subloading a one frame before setting up PDR like jwatte suggested, but that didn’t help either. Although, I don’t really understand what he was saying about double-buffering.
While we’re on the subject of textures (that’s just a wonderfully vague forum topic ), I was thinking about something else. Ultimately, I want to send a video stream to my graphics card that I bring in from a tbd frame grabber board, which is why I’m doing all this experimenting. I’ve heard that some of these boards can DMA to system memory or frame buffer memory. Does anyone know how you get the address that points to frame buffer memory? Or, better yet, is it possible to get an address pointing to the texture data on your graphics card and then DMA straight to that? My guess is that your graphics drivers would know this info but wouldn’t have any way to give it to you.
[This message has been edited by mogumbo (edited 11-04-2003).]
You have to use wglAllocateMemoryNV as well.
Also, after the texsubimage call you’ll need
do some CPU work then call
[This message has been edited by Adrian (edited 11-04-2003).]
You don’t need to flush after the setfence.
Also, if you’re only getting 77 MB/s, then you’re off the hardware accelerated path. I’d recommend using GL_BGRA in-memory pixel format and making sure that your data is aligned on at least 4 bytes (16 would be better).
Look at PixelStore() for how to specify things like stride, alignment, etc.
Also, try using VTune and see where you’re spending your time; then avoid spending time in that place.
Originally posted by jwatte:
You don’t need to flush after the setfence…
I had a lot of problems getting async behaviour with readpixels until this response from mcraighead:
"You are not flushing after the ReadPixels. We will queue up the ReadPixels command, but until the next flush (either from app or driver), the HW will not execute the command. FlushPixelDataRange is effectively a Finish, so it will flush and then wait for the op to complete.
If you plan to go off and do some other work, you will want to call glFlush to ensure that the ReadPixels doesn’t sit in our command queue forever. In some cases, however, the glFlush is not really necessary."
In my case, this is a Linux app so I’m using the glx version of these commands. I had rejected glXAllocateMemoryNV yesterday after it gave me a terrible performace hit. I would expect to pass it a read frequency of 0.0, a write frequency of 1.0, and a priority of 1.0, but this slowed me waaaaaay down. Today I tried a read frequency of 1.0 and a write frequency of 0.0, which only gave me a slight performace hit over using new. But I’m doing writes, not reads
Neither FlushPixelDataRangeNV nor fenceNV have helped my performace at all.
For glPixelStore, I need to set my PACK_ROW_LENGTH and UNPACK_ROW_LENGTH just to get glDrawPixels to work. I was using rgb textures originally, so I just tried playing with the PACK_ALIGNMENT and UNPACK_ALIGNMENT while using rgba textures. glDrawPixels seems to draw the same speed with rgb or rgba. glTexSubImage2D scales approximately linearly with number of bytes I’m subloading.
wglAllocateMemoryNV(…, 0.0, 0.0f, 1.0f);
i.e. a read and write frequency of zero.
Don’t use glFlushPixelDataRangeNV, it behaves like a finish. Use flush like I said.
Use BGRA and make sure your drawable has alpha, so if you’re using GLUT, then add GLUT_ALPHA when creating the window.
Everything I’ve said applies to PDR readpixels, I’m assuming the same applies to texsubimage2d.
Nope, glXAllocateMemoryNV(…, 0, 0, 1) gives me a really bad performance hit. And I’ve tried the other stuff you just mentioned with no luck.
I’ve been using glTexSubImage2D and glDrawPixels. Maybe this stuff only works with glReadPixels
about the glXAllocateMemoryNV, i tried it to use with NV_VAR along time ago, and i can’t reach the same performance level like on Win32 plateforme (with 1 as priority), in my opinion this extension that does not manage the ‘graphics memory’ on linux when you asking 1 as priority. I think you can at most manage ‘AGP memory’ on linux.
But maybe the driver return ‘graphics memory’ now