I have a fairly mature chunk of code which repeatedly updates lots of texture tiles which I upload from client space using glTexSubImage2D into a large texture on the GPU.
As an experiment, and to reduce CPU stalling, I modified the code so that it wrote the new tiles directly into PBOs, and then used a call to glTexSubImage2D(…,…,…,…,…,…,NULL) to effect the texture update on the GPU internally.
I have always use “fenced” client space RAM in any case so while I was aware that I would not save any memory copies, and upload speed would always be at DMA speeds, I hoped that the stalling would be reduced.
All texture formats used are natively supported. The texture tiles are 4 channel 17 x 17.
I was surprised to see a speed drop overall of about 10% doing things this way (and weirdly did not notice any change in the CPU load at all) and was wondering why?
My only theory is that now that the copies are taking place on the GPU, rather than with the CPU “assisting”, the GPU is having problems managing that as well as rendering, and the CPU is still ending up having to wait for the GPU elsewhere…
Can anyone else make any suggestions, or do I have it about right?
One further question for anyone that knows…
If I stop using textures and only use PBOs is it worth considering using glBufferSubDataARB instead of glTexSubImage2D so that texture updating is a one time copy from the CPU directly from any algorithm generating the tile data?
Are there any pitfalls to this idea? Internet info is sparse on this.