fast pbo with ,nvidia


I need to speed up my pbo with a P4 computer and a xeon computer… Is there any opinion about fast memcpy that we can use with pbo on windows compiling with visual c++ 2003. Is it a good idee to use memcpy_amd with intel architecture?



You can try to use glBufferSubData call instead of
Map/memcpy/Unmap sequence.

Driver can be optimized to detect CPU and decide which internal memcpy to use for mem transfer.

In my video player test app, changing from Map/memcpy/Unmap sequence to glBufferSubData produce lower average CPU time (2-5%).

If you want to do readback, you can use glGetBufferSubData instead of Map/memcpy/Unmap sequence. There is a bit speedup.
Test benches on my test machine (NV 6800GT, FW 76.45, P4-3.2/HT):
Map/memcpy/Unmap = 476 MB/sec
glGetBufferSubData = 484 MB/sec


If you want to perform fast memory copy, use MMX instructions with prefetch reading and non-cashed writing(MOVNTQ instruction if I’m not mistaken). AMD has a paper on it. Notice that it only has sence with very large memory regions, at least 100Mb

Does mmx etc. operations have any meaningful application in connection with memory-mapped vertex/pixel-buffers? I would assume they were only useful for system-memory transfers?