Strange texture performance/behaviour

I performed a rapid measurement to know the truth about the padding issues, and here are my results:

3D Card: NV8500GT
Driver version: 169.21

I’ve uploaded 10 times a texture with no mipmaps, and I took the seconds that it taken. I tried with several texture’s sizes: 4095, 4096, 4097 and 8000.
The thing I wanted to know was if effectively, the driver was padding with zeroes the textures when they are NPOT.

For a 4095x4095x3 bytes texture, I got a mean time (per texture upload) of 0.126444 seconds

For a 4096x4096x3 bytes texture, I got a mean time (per texture upload) of 0.12778 seconds (almost the same than the 4095 version)

For a 4097x4097x3 bytes texture, I got a mean time (per texture upload) of 0.128251 seconds. It is in the same ‘range’ than the 1023-1024 versions, so if it were padded with zeroes the time should be much bigger, because it shall be uploading a 8192x8192 texture

For a 8000x8000x3 bytes texture, I got a mean time (per texture upload) of 0.489454 seconds, four times more than the 4097x4047, so my thoughts are that, (at least without mipmaps) NPOT textures are not padded with zeroes to fill until the next power of two size, at least not in the uploading process, but we can’t know how many vram is using the texture.

After these results, I reach to the conclusion that we don’t know anything new :/, so I’ll try some tests mixing DX GetAvailableTextureMem call after and before uploading the textures, to see if I can reach to any conclusion.

I think it’s hard to track down what the driver is actually doing. It’s possible that it is converting the array internally e.g. the RGBA texture format is internally stored as BGRA, so the driver will have to convert an RGBA array to a BGRA array before upload. Furthermore, it seems reasonable to assume that, if padding were true, there is no sense in uploading all 8192 rows for a 4097-row texture since the remaining rows are empty. Looks to me like it’s hard to tell how much time is spent in the driver compared to actually uploading the texture.

Maybe it’s better to check the upload rate of a texture rectangle of 4097x4096 with BGRA data against its 2D counterpart.

There’s no reason at all to assume that anything I’m saying here is true. It’s just that I have no idea why else they would mention texture padding in chapter 7.1.2 of the GPU Programming Guide.

Cheers,
N.

The point was that there don’t need to be all original source pixels resident at all times, just so much to fill the pixels on screen.
Check out the video on “Seadragon” which mentions in the beginning of the presentation that the performance is NOT limited by the data they view but by the onscreen pixels only!

http://www.youtube.com/watch?v=PKwTurQgiak

Cool stuff!
(The “PhotoSynth” technology seems to be even more interesting since it is 3D-ish.)

Well, our software is in some way related with photosynt, but Architecural and Industry oriented, but the performance is good now (after the registry hack) and we’ve reached the enough number of high res images that the user could need to perform his tasks, so if I convince my bosses, I’ll try the “performance is NOT limited by the data they view but by the onscreen pixels only” technique. This video really impressed me, and I think that I’m able of doing something similar, and if my bosses consider that worths the effort, that way of texels handling will be in the application, I promise :slight_smile:

I’d be happy if someone would invent something like Seadragon for 3d-geometry :slight_smile:

Actually, having a low density point cloud from a set of photos of an object is not difficult (that’s what Photosynth does), even its not computionally expensive, the problem is try to have a very dense point cloud (a point for each picture pixel) and then triangulate it, because traditional computer vision algorithms are very error prone in a non user-assisted environment. But it’s really possible with some relatively new techniques, reducing the user intervention to a minimal part.

so the driver will have to convert an RGBA array to a BGRA array before upload

I believe most modern hardware can do that kind of conversion in the DMA transfer engine, so the driver doesn’t need to use CPU to do it.

Also, when I suggested using a low-res image, I meant only until the high-res data has been loaded into RAM, then re-paint with the high-res data. The main point being to provide a smooth, high frame rate for interaction, and then make sure that you only put detail where it matters.

I finally got the expected results :slight_smile: Thanks to the registry hack, the framerate in the application is fast and stable, and thanks to -Nico-, that pointed me to that NV pdf, the memory consuption is optimal. Now I can load the expected number of photos until reaching the true video memory limit (setting the 3D slices dimensions to a tuned power of two amount ( 128 - 256 ) the memory waste is minimal, so I can confirm that NPOT textures are padded in memory.
Before setting dimensions to a POT size, I only was able to fill memory until 350 of texture data. Now, with POT sized slices, I can fill much more, about 500Mb for a 512Mb card, so the memory is much better used, and I can see a noticeable speed increasement (maybe using ‘small’ slices benefits the caché?)

Thanks to all for your ideas :slight_smile:

By the way, only for information, current Seadragon version is written with OpenGL :slight_smile:

Yes ironic isn’t it!

BTW, I’m missing what is so new about Seadragon that they have run up such a patent storm? It is basically virtual mipmapping, been around forever. Seriously google maps, meta texturing, even that tiny flash gigapixel image viewer, all good examples of the concept already working and not just a prototype.