Drawing milions of points?

Hey guys,

I have a large amounts of points that need to be drawn on screen. We’re talking on the order of 6+ milion. Now the issue I have is that I’m pushing the limits of the card an I’m running out of memory.

I tried doing culling, but I don’t think it’s working, I still get just a blank screen when I exceed a certain amount of points. What would be a good way to deal with this issue?


Which card do you have? And what is exactly the problem? How do you render the points?

How are you rendering the points? Coordinates of 6 milion points shouldn’t take too much memory if you just use a brute-force approach (and I love brute-force approaches :slight_smile: and should be handled easily by any reasonably modern hardware (provided you render them correctly, using VBOs and such). Now, if you want to optimize, some details about the nature of your data will be very helpful (are the positions of the points static or dynamic etc.).

I have a Quadro NVS 135M with 128mb of on-board vram. Basically I draw the points using the vertex array. Here is some pseudo code:

float vertices[]
populate vertices[] with x,y,z data
gl.glVertexPointer(3, GL.GL_FLOAT, 0, vertices);
gl.glDrawArrays(GL.GL_POINTS, 0, vertices.length);

this method works just fine up to about 6 milion points. If i try and render more than that all i get is a blank screen. The “out of memory” error pops up after glDrawArrays line.

I want to do culling because we will just be getting more and more points in the future, and there is no need of sending the “invisible” points to the video card. Is there a way to cull vertex elements? I know it’s possible to do with spheres, boxes, etc, but how do you define a face of a vertex? I tried the cull_vertex_ext, but it’s not supported on my hardware, and from what I can tell not on any of the cards I could put my hands on, including a Quadro FX 550.

points are static and all we do with them is simple viewing. We want to be able to view them (rotate, zoom-in/out, …) and then overlay some data at later time, but that’s minimal.

Ok, first of all you should use the VBO extension (google if you don’t know what it is).

I don’t have any practical experience with 3d graphics programming, but in your case I would use a octree (again, google if you don’t know what it is). You could store individual VBOs as octree nodes. With the tree you can easily determine only the visible nodes, as an additional optimization you could replace nodes that are visible but far away with a single point (if a point cloud is so far that it would occupy only few pixels on the screen anyway) – but that depends on the granularity of your nodes.

Now, I guess you are using Java. Depending on your JRI and CPU, multiple draw calls (which are inevitable if you use occlusion) could be an overhead with JNI. If this happens, you will have to experiment a bit with the size of your nodes or implement the rendering part in C (or similar)…

So I just tried VBOs and I still get “out of memory” error. It makes sense since all VBO does is copies the data into vram instead of working from main memory.

I figured for 12 milion points I’m using about 12 bytes EACH, so I would need around 144Mb minimum assuming there is no overhead. I think I’ll have to look into culling with octree to see if that can help.

No, as far I as know VBO manage dynamically memory depending of the hints you have given to it. For example, when you suggest to the vbo that the geometry is static using GL_STATIC_DRAW, the driver will favor the data upload into vram because this one is very fast and it won’t have to update this data often as you suggest to it. Then, for most applications it is not possible to enfore such a policy so the rest of the data should remain in system memory and then in swap before crashing…
I though it is why Zengar suggest you to use VBO instead of vertex arrays

But wouldn’t it send the stuff to VRAM eventually? and this would cause an out of memory error? I would have to cull the points before putting them in VBO or at least before sending the VBO data to the video card, no?

I suggest you store all the points in your ram und cull them; then render only relevant points using the normal vertex arrays. If this is still to slow for your purpose, you can implement some fancy async streaming to VBO (like rendering some VBOs while loading data to other VBOs in a different thread – I hope this does work?). Or you can use streaming VBOs from the beginning, but this will result in more work…


how are your points organized?
Are they distributed around the viewer or maybe in one block ( f.e. 100 x 200 x 300 points? ).

I assume, that most of the time you want only view the data and sometimes do viewing operations ( rotating, zooming ).
While operating(changing the viewing matrix) it is possibly not necessary to show all 6 million points but only some of them or the bounding box, so you don’t need vertex arrays or vbos and can send your data directly to the graphics card, if the viewing operation is over.

( sorry for my bad english )

Could you elaborate what do you mean by sending data direct to the card?

simply with glVertex() :slight_smile:
ok but if you are holding your data anyway in memory, vertex arrays or vbo’s would be better

glVertex does not send the data direct to the card, but constructs a kind of vertex array internally before sendint it to the card + you get the setup and multiple call overhead. This is the reason why the immediate mode is so much slower compared to VBOs, where the data for rendering is already in the server space.

I would suggest sorting your points into an octtree by dividing the world into 8 sections recursively until each octant contains less than some constant amount of points (say 5000). Then you create a vertex array for each octtant. Now you can cull by checking the octtant bounding box against your frustum, and reject large amounts of points.

Even if you do not CULL, you can now draw each octtant and get everything rendered. The issue you are seeing is because you try and shove all of those points into one array. If that array is larger than your memory, you are in trouble. If you break them up into smaller portions, your card can discard older sets for the latest one you rendered. It’ll be slow, but work.

Also, rendering more than about 5000 at a time will be slow anyway.

Hello Zengar,

oh I don’t know this with the internal vertex array :eek: .
Would this internal array be erased if you call glFlush() between?
Then you could maybe call glFlush every 500000 points.


Dj3hut1, the golden rule is to avoid immediate mode (glVertex) alltogether. Never use it if you want performance (except you only need one quad or such): it is really SLOW.