best performance for texture upload?

forget about GL_TEXTURE_RECTANGLE_ARB, its use is very specific.
If your card support it, go for NPOT.

ok, so im using GL_TEXTURE_3D with a size of 2048*3072, that is NPOT, but works.

  1. but has anyone a suggestion becauif it takes the same time to upload GL_BGR_EXT data, compared to GL_RGB?

  2. for the vertex shader im restricted to use floating point internal representation (GL_RGBA_FLOAT32_ATI).
    but for the fragment shader, i need just 8bit data RGB, so GL_UNSIGNED_BYTE with RGB is fine as internal representation. can i use without performance penalty and which format is the best for that?

thanks a lot,
chris

  1. Why would it take less time to upload the same amount of data ? Wether it is BGR or RGB, it’s the same, to my point of view.
    I didn’t read all the topic, but maybe some compression would help.

thanks jide,

i ask that, because somewhere someone stated, that bgr instead of rgb is not natively supported, so the data has to be uploaded (same amount like rgb) and rearranged into rgb. the rearranging could be a performance penalty.

here, i just want to verify, if its really like this and what happens internally.

I might be wrong unfortunately. I just guessed. But I also guess that rearangement is not such an important task so the difference might be unnoticeable, almost with the current high speed data transfer rates.

Try S3 compressions, it should help.

hey jide,
thanks for your help!

what are s3 compressions?

I suppose you have following scenario:
input device -> system memory -> ogl texture -> render.

In this case you have a 2 memcopy operations from input device to a system memory and from system memory to GF7800. You may try to avoid this double copy by using PBO.

  1. Create 2-4 PBO’s each have size for one 1024x768 texture.
  2. In loop, obitain pointer to PBO memory buffer (by glMapBufferARB) and copy data from input device to mapped buffer, then unmap buffer and call glTexSubImage2D(…). This will start async transfer, so you may immediatly use another PBO for next step in loop.

Why using several PBO’s? Well, while one PBO is busy during image data transfer you may use another PBO to prepare and even more start new transfer.

Theoretically, you can get up to 2 GB/sec in very special case (no memcpy, just glTexSubImage2D from PBO), but in real usage you may expect ~600MB/sec.

Use BGRA textures, if you don’t need 32bit precission (in floats) you may use half type (16bit float - faster) or regular 8-bit (fastest).

Im wondering how did you manage to “feed” 192MB/sec in system memory. What input device (or HDD) you have?