Drawing milions of points?

metalac · July 21, 2008, 1:19pm

Hey guys,

I have a large amounts of points that need to be drawn on screen. We’re talking on the order of 6+ milion. Now the issue I have is that I’m pushing the limits of the card an I’m running out of memory.

I tried doing culling, but I don’t think it’s working, I still get just a blank screen when I exceed a certain amount of points. What would be a good way to deal with this issue?

Thanks.

babis · July 21, 2008, 2:04pm

Which card do you have? And what is exactly the problem? How do you render the points?

Zengar · July 21, 2008, 2:09pm

How are you rendering the points? Coordinates of 6 milion points shouldn’t take too much memory if you just use a brute-force approach (and I love brute-force approaches and should be handled easily by any reasonably modern hardware (provided you render them correctly, using VBOs and such). Now, if you want to optimize, some details about the nature of your data will be very helpful (are the positions of the points static or dynamic etc.).

metalac · July 21, 2008, 2:15pm

I have a Quadro NVS 135M with 128mb of on-board vram. Basically I draw the points using the vertex array. Here is some pseudo code:

float vertices[]
forloop
populate vertices[] with x,y,z data
endforloop
glEnableClientState(GL.GL_VERTEX_ARRAY);
gl.glVertexPointer(3, GL.GL_FLOAT, 0, vertices);
gl.glDrawArrays(GL.GL_POINTS, 0, vertices.length);

this method works just fine up to about 6 milion points. If i try and render more than that all i get is a blank screen. The “out of memory” error pops up after glDrawArrays line.

I want to do culling because we will just be getting more and more points in the future, and there is no need of sending the “invisible” points to the video card. Is there a way to cull vertex elements? I know it’s possible to do with spheres, boxes, etc, but how do you define a face of a vertex? I tried the cull_vertex_ext, but it’s not supported on my hardware, and from what I can tell not on any of the cards I could put my hands on, including a Quadro FX 550.

metalac · July 21, 2008, 2:17pm

points are static and all we do with them is simple viewing. We want to be able to view them (rotate, zoom-in/out, …) and then overlay some data at later time, but that’s minimal.

Zengar · July 21, 2008, 2:27pm

Ok, first of all you should use the VBO extension (google if you don’t know what it is).

I don’t have any practical experience with 3d graphics programming, but in your case I would use a octree (again, google if you don’t know what it is). You could store individual VBOs as octree nodes. With the tree you can easily determine only the visible nodes, as an additional optimization you could replace nodes that are visible but far away with a single point (if a point cloud is so far that it would occupy only few pixels on the screen anyway) – but that depends on the granularity of your nodes.

Now, I guess you are using Java. Depending on your JRI and CPU, multiple draw calls (which are inevitable if you use occlusion) could be an overhead with JNI. If this happens, you will have to experiment a bit with the size of your nodes or implement the rendering part in C (or similar)…

metalac · July 21, 2008, 3:41pm

So I just tried VBOs and I still get “out of memory” error. It makes sense since all VBO does is copies the data into vram instead of working from main memory.

I figured for 12 milion points I’m using about 12 bytes EACH, so I would need around 144Mb minimum assuming there is no overhead. I think I’ll have to look into culling with octree to see if that can help.

dletozeun · July 22, 2008, 12:22am

No, as far I as know VBO manage dynamically memory depending of the hints you have given to it. For example, when you suggest to the vbo that the geometry is static using GL_STATIC_DRAW, the driver will favor the data upload into vram because this one is very fast and it won’t have to update this data often as you suggest to it. Then, for most applications it is not possible to enfore such a policy so the rest of the data should remain in system memory and then in swap before crashing…
I though it is why Zengar suggest you to use VBO instead of vertex arrays

metalac · July 22, 2008, 9:21am

But wouldn’t it send the stuff to VRAM eventually? and this would cause an out of memory error? I would have to cull the points before putting them in VBO or at least before sending the VBO data to the video card, no?

Zengar · July 22, 2008, 9:27am

I suggest you store all the points in your ram und cull them; then render only relevant points using the normal vertex arrays. If this is still to slow for your purpose, you can implement some fancy async streaming to VBO (like rendering some VBOs while loading data to other VBOs in a different thread – I hope this does work?). Or you can use streaming VBOs from the beginning, but this will result in more work…

dj3hut1 · July 22, 2008, 11:04am

Hello,

how are your points organized?
Are they distributed around the viewer or maybe in one block ( f.e. 100 x 200 x 300 points? ).

I assume, that most of the time you want only view the data and sometimes do viewing operations ( rotating, zooming ).
While operating(changing the viewing matrix) it is possibly not necessary to show all 6 million points but only some of them or the bounding box, so you don’t need vertex arrays or vbos and can send your data directly to the graphics card, if the viewing operation is over.

dj3hut1
( sorry for my bad english )

Zengar · July 22, 2008, 11:07am

Could you elaborate what do you mean by sending data direct to the card?

dj3hut1 · July 22, 2008, 11:10am

simply with glVertex()
ok but if you are holding your data anyway in memory, vertex arrays or vbo’s would be better

Zengar · July 22, 2008, 11:47am

glVertex does not send the data direct to the card, but constructs a kind of vertex array internally before sendint it to the card + you get the setup and multiple call overhead. This is the reason why the immediate mode is so much slower compared to VBOs, where the data for rendering is already in the server space.

Keith_Z_Leonard1 · July 22, 2008, 1:19pm

I would suggest sorting your points into an octtree by dividing the world into 8 sections recursively until each octant contains less than some constant amount of points (say 5000). Then you create a vertex array for each octtant. Now you can cull by checking the octtant bounding box against your frustum, and reject large amounts of points.

Even if you do not CULL, you can now draw each octtant and get everything rendered. The issue you are seeing is because you try and shove all of those points into one array. If that array is larger than your memory, you are in trouble. If you break them up into smaller portions, your card can discard older sets for the latest one you rendered. It’ll be slow, but work.

Also, rendering more than about 5000 at a time will be slow anyway.

dj3hut1 · July 22, 2008, 10:08pm

Hello Zengar,

oh I don’t know this with the internal vertex array :eek: .
Would this internal array be erased if you call glFlush() between?
Then you could maybe call glFlush every 500000 points.

dj3hut1

Zengar · July 22, 2008, 10:11pm

Dj3hut1, the golden rule is to avoid immediate mode (glVertex) alltogether. Never use it if you want performance (except you only need one quad or such): it is really SLOW.