Using COLLADA with very large data sets?


I’m new to COLLADA (just read through the spec and several wiki/forum articles), but I hope my questions is not completely stupid… :wink:

I want to use COLLADA with very large data sets (several GB of data), from which I only want to visualize a little part in my scene - it’s like having a very long road in a racing game and only wanting to draw the part of the road around the player’s car.

Is COLLADA feasible for such an application?

To be more precise: The main problem is that my input data is loo large to be held in the memory completely, so I would have to stream the “road” into the memory while “driving” along. I don’t need to compute how far the “driver” would see, the part of the “road” I want to draw has always the same length.

I know that using a XML-based format like COLLADA is not the best idea with the application described above (a binary format would be much better, I think).
But I really need to know if it would be possible to do this, not if it is the most sophisticated or efficient way to use COLLADA…
btw: the data is simple triangle mesh, but with lots of vertices.

Thanks in advance,

I’d probably advise against it, but Your Mileage May Vary. Specifically as an example of a possible problem, I found out that Google Earth can’t handle COLLADA files with more than 65536 vertices. It’s likely that you’ll find undocumented limitations like that if you try to create massive data files in COLLADA.

One thing you COULD possibly do though to get around limitations like that is to generate many smaller objects. Instead of storing a digital elevation model of the entire planet in a single COLLADA file for instance, you could store 1/8 degree sections, and make the app smart enough to load just the pieces it needs.

Regardless of using COLLADA directly in an application or not, you should plan for large database using COLLADA external reference system.
Any large asset can be split in many COLLADA assets/files. A ‘master file’ simply reference the external assets by reference, rather than having all the data stored directly. Using this at several level of hierarchy works really well, since you do not have to load/parse the data you do not need.

Movie production studios have tools that can parse the database (stored in a real database system, not files) based on a camera query, and return a file from the query that contains only the objects visible, before sending the optained file to the DCC tool, since they have databases that are too big even for Maya. Once the work is done, the file is saved, and then parsed by the asset manager so the data is scattered back to where it belongs. This is the classical scatter/gather mechanism used in software optimization.

Would be nice to have such a tool in the game industry…

Thanks for your suggestions, the idea of storing the actual (very large) data in an external file(s) or a database is interesting. I also thought of this, but wasn’t sure if it would work as intended!

But I fear that using a database (or an external file) with several levels of COLLADA-files/-assets for data management will slow the application down severely. I’m not sure if the application would be able to render in real-time (which is a crucial part of the visualization), what do you think?


I use to work with visual simulation and we do terrain and texture paging from disks.
Depending on your strict requirements (hard real time? fidelity?) it is somewhat possible.

Have a look at OpenSceneGraph. It currently have the ability to load DAE.


What you have to do in order to implement database paging (regardless of what format the data is stored in), is to have a separate thread/process to handle the disk IO (DMA?), waiting for the disk. Then this thread need to process the input data to create the memory data structure that will then used by the real-time part of your application. Once this is done; the data (usually scene graph) is added to the list of objects the real-time has to process/draw.

OpenSceneGraph is one of the library implementing this architecture.

Basically, IO (disk/network) are the main bottleneck for database paging, and they need to be decoupled from the real-time rendering threads. For most 3D applications, the bandwidth come from images/textures/movies and animation data, geometry and other meta parsing data does not account for much time. In other words, XML/COLLADA parsing works just fine for geometry, scene, camera… but it is best to use external binary/compressed files for texture and animation data.

I thought of a similar approach (using two threads: one for reading the data, one for rendering), so I would only have to “throw away” old data to avoid holding too much data in the memory at the same time.
If I understand you correctly, OpenSceneGraph can read the data in the way you described? That really seems to be a appropriate solution for my problem, thanks a lot, even though the immense size of my data doesn’t result from textures (in fact, I don’t have any), but from the enormous number of vertices (it’s a scientific data set).
And since OSG even supports reading COLLADA files, the solution seems nearly perfect! :wink:

Sincerely yours,