Strange texture performance/behaviour


I’m working in an application that needs to load many many texture data. I’m talking about 40-50 4256x2848xrgb images in a typical worst case scenary.

Well, all my life I grown up with the idea that GL drivers were so smart that we didn’t need to know the video memory usage in our programs. I thought about texture memory/disk swapping behaviour when applications exceeds the video memory available, causing a slowdown. This behaviour was fine for me until know, because it was according with the GL spec (driver tries to allocate texture objects in vram, that’s resident textures, and if can’t it will do it in main memory and will swap to vram when needed, even if the process is slow)

With my current application (many 2D textures, tiled in 3D textures to avoid the maximun 2d texture size limitation), when I load, say 350Mb of texture data (having a 8500GT 512Mb) strange things starts to happen. Let me describe the scenary: Suppose that I load 16 textures, and I proceed to draw them in the canvas as screen aligend quads in a 4x4 matrix fashion. Well, I expected that if I exceed the video memory available the only effect will be an horrible slowdown, but the real thing is that textures jumps from a position of the matrix to another, and some positions becomes invalid. Its like if the driver becomes crazy and the “internal structure that stores texture names and texture images” were messed up totally.

This problem happens when I reach to the ‘magic limit’ of ~350Mb of texture memory, until there, all works fine.
I’m sure that I could write a repro application to fill a bug report to NVidia, but I want to know your opinions and experience about this matter (using huge texture memory quantities)
I can workaround this problem by loading low resolution and grayscale versions of the images to avoid reach the magic limit but I do not want put the application in the production stage with this problem.

Other problem I’m having is this one.
If I load many textures, without reaching the magic limit, performance drops a lot (about 0.5f fps). Well, that’s not strange. The strange thing is that if I zoom out a lot the scene (currently a 2D scene. This part of the application is some kind of photo editor) and then I restore the zoom to 1, the performance grows a lot, having about 30fps. I think that this could be cache related, but I’m not sure because I can’t realize the true reason of this behaviour.

There is another scenary with the same problem. Photos have many markers placed over it, say around 60-100. Markers are drawn correctly without performance drops (thanks to textured point sprites and vertex shaders :slight_smile: ) but if I activate the flag that enables the markers’ labels, performance drops down again, until some seconds passes (around 20’) or I zoom out/in the scene. Then, the fps gets useable again.

I checked the drawing loop to see if some high cost operation is performed in the first frames of the slowdown, and nothing strange happened. I tested too texture residence (font texture and photos textures) and the driver always reports residence, so I am really lost with these two problems, so any help will be welcomed :slight_smile:



I have seen problems with 3D textures on NVIDIA hardware, too. I had problems with a volume rendering application when using very large volume data (several hundred MB) on a linux machine with the first Geforce 8 compatible drivers. When I exceeded a specific limit of memory (800MB on a 8800GTS), opengl output got extremely slow, sometimes the system crashed; in addition the framebuffer got currupted when the window was resized (it looked like a memory management error). I submitted a bug report but did not verify yet if it is really fixed; you should perhaps try the latest drivers and see what happens …

There was a thread recently, where someone from nVidia stated, that there hardware does not support 3D textures larger than 512 pixels (in any dimension), but that the driver has a bug and does not throw an error.

Apart from that having HUGE textures is in general a problem. And with 3D textures that’s easy to achieve.


I have 4256x2848 2D textures mapped as 3D tiled textures. I use MAX_3D_TEXTURE_SIZE for tile size and nearest interpolation to create them. For instance, a common 4256x2848 2D texture is created as a 3D texture with 6 tiles, having each tile 1419x1424 dimensions. So, are you saying that each tile must not exceed 512? it is not a problem for me, except that I’ll waste more memory, few rows and cols that wont be used, but I don’t care about that.

512 was up to GeForce 7.
GeForce 8 class HW has a max 3D texture size of 2048.

Instead of 3D textures you should look into 2D texture arrays for your tiling.
Actually GeForce 8 class HW doesn’t need to tile 4256x2848 at all, because the max 2D texture size is 8192 and it supports non-power-of-two textures.

Yes, I’m aware of texture arrays, but IIRC they are only available in 8800GTX (correct me if I am wrong), and we do not want set the minimun requisites for the application the current best card in the market. By the way, now we are working with 4256x2848, but we have planned support very_large textures (23.000 x 23.000)

About perfomance drops - look here, this should help a bit:

hmm, the registry hack described in this link seems that solved the problems. I’ll perform some tests and post again with the results.

Many thanks :slight_smile:

Actually, I believe you’re already wasting memory. If texture3D is similar to texture2D you’re wasting a lot of memory because the texture memory footprint is being padded with zeros until it reaches the next power-of-two size in both dimensions. You should use texture rectangles if you don’t want to waste memory but note that they don’t support mipmapping.

PS. Please correct me if I’m wrong, anyone :slight_smile:


I dont know the internals about npot extension implementation, but I really hope that if I create a 513x513 texture, the driver wont create a 1024x1024 texture and fill the holes with zeroes.If it does, it could be a problem for our application, because its very texture memory intensive.

Our algorithm uses the best fit for tiles, so only few rows and columns are wasted (~<9) (from the application programmer point of view) So, a comment from an NV person of how to waste the less amount of texture memory will be really appreciated :slight_smile: (currently we only support NV cards for this project)

Well, here’s one source of information that seems to confirm my suspicions. (Chapter 7.1.1-7.1.3)


Thanks Nico :slight_smile: I’ll consider switch to texture rectangles. This could explain the “magic limit” I was talking about :slight_smile:

By the way, the registry hack worked in all scenaries. Now the application runs faster than always :slight_smile:

Many thanks,

I hope this padding thing doesn’t hold true for the 8800 line of cards. :frowning:

EDIT: I could’ve sworn that one of the specs gave the formulas used for computing the mipmap levels of NPOT textures… why would the driver have to pad for NPOT with mipmaps, but not for NPOT without mipmaps? I don’t understand.

EDIT2: I see now that they were talking about the texture rectangle thing… this is a real shame. I thought for sure it was possible to use the NPOT extension with mipmapping, and not have to suffer this padding nonsense… oh well.

EDIT3: Wait, is this padding only for getting it to the card? How is it that I’m able to still use normalized texture coordinates but with a potentially padded NPOT texture (with mipmaps) without accessing the padded portion? Seems odd.

I’m going to be performing some tests later to see if this is the case with the 8800. Perhaps timing the transfer of a 1025x1025 NPOT vs a 2048x2048 POT would be a good test.

Because your working set is larger than the working set of the target hardware, you need to do application layer optimizations.

For example, most displays are 2000x1500 pixels or smaller, so you never need all the pixels of a 4000x2000 texture. If you know what the geometry is that you’re drawing, you can upload only the textures that you know will be visible, and you can upload smaller versions, or sub-regions, of textures where you know either that they will be filtered, or that only a region of the texture will be visible.

High-performance custom visualization software does these things. The driver writers claim that we “shouldn’t have to,” but clearly, we do. In fact, when you have more textures than can fit in main RAM, you HAVE to go this route, no matter what. Might as well start now.

For example, most displays are 2000x1500 pixels or smaller, so you never need all the pixels of a 4000x2000 texture.

That depends entirely on how you use it. You may not need all of those pixels at any one time, but you do need them to be there.

You may not need all of those pixels at any one time, but you do need them to be there.

Yes – my statement lacked the qualifier “at one time.” In reality, you’ll probably pre-fetch enough of the data for whatever the user is navigating (with some prediction) and make sure that’s available in high-rez. If the user can hyper-space-jump around the data, then drawing pink while you’re demand-paging the data is likely a good idea, too. Or perhaps a lower resolution version of the texture, if you’re not in a situation where it doesn’t matter whether the user knows whether he’s seeing the full resolution image or not (e g, not in medical imaging).

With 23,000x23,000 pictures, no card will do that in RAM anyway, so you might as well start writing your paging and partial mipmapping functions now.

With our current working images (4000x2000) we need at any time all pixels. We can’t give the user low resolution images to work with, because it will end in big inaccuracies in the application process. The application’s workflow demands all texture pixels with nearest interpolation to be given to the user, and letting him mark some pixels, with subpixel accuracy.

So the user goes from a low resolution view of the images (having all images in rectangular disposition (for instance) over the canvas) to a hight detailed view in few senconds (zooming an image to be able to select a point inside a single pixel with an 0.001 subpixel precision at least.

The only way we could follow, is limiting the user’s working set of pictures in layers, setting a number of images limit for the layer, having loaded only the pictures that belongs to the visible layer.

It’s obvius that 23.000x23.000 pictures are too large for any current vram, but we dont reached this point yet. We are designing the algorithms and techniques, keeping an eye in this future scenary, but when we truly support the very high res pictures (actually, aerial pictures) we wouldn’t live with the current 2D texture limitations, and we will have to “tile the tiles” :slight_smile:

P.D: This is an photogrametric application, with metric quality in the measurements, so that’s the reason of having such large images and the importance of subpixel precision and the fast interactivity needed for the user.

Maybe there’s also some lossless texture compression you can use to save some memory. Haven’t tried it myself but apparently it’s supported.