Memory and Speed issues

Referencing this data externally seems to be the best solution; it keeps the file clean, consistent and easy to read, but allows massive amounts of data to be stored and recalled efficiently. If you decide to store binary data, then what is the point in wrapping it in xml tags? Imagine a file that contains five objects, each with 100,000 polys - it seems ridiculous to embed the binary data within the file, between xml tags, when the file is predominantly binary. On the other hand, if you have a ton of smaller objects in the file, and the vert lists are relatively short then the benefit of embedding this data in binary form is not really apparent.

Referencing data externally is not a new idea by any stretch of the imagination - #include <etc>. One of the great benefits of an intermediate format is that it can be abstract and flexible. This also comes into play when you start to consider how large files will be dealt with in the future. For one, it is generally not possible to edit all of the polygons in a complex scene in a meaningful way. Generally, an artist will work subsection by subsection, exporting each subsection separately. These exports can be referenced by a top level file that is really quite small. When an edit is made to one of the source files subsequently, only that area must be exported, nothing in the top level file has changed. If the level grows, then two exports need to be done: 1) The new section and 2) the top level file - which, if it is only references should take more time to think about than to export - in which case it might make more sense for it to be edited by hand;)

-Judah

That is not true for COLLADA. And that is one of the major breakthrough, and design goal. One can create tools outside of the modeler, modify the COLLADA file, and be able to load the files back in the modeler.
In other words, COLLADA is designed to be the source data. The Asset tag is there to enable asset management to work on a per element basis, and not on a per file basis, as it is the case today. So the importer can understand what has been changed outside of the modeler.

Yeah, and because of this, there will be a great need for intermediary tools to edit these exported files. For instance, once you have exported a number of files, and defined some assets, you might want to try a substitution of sorts. Instead of re-exporting, I think a tool in the midground would be very helpful. Something that would allow you to edit the flow of data, so to speak. This would not be a tool for editing vertex location or values, or performing other tasks best handled by art tools, but a tool that would allow you to edit data sources and destinations, such as external references, etc. I’m thinking of something like the Maya hypergraph editor - any takers?

This is once again, a great example of why not to flatten the exported data - once it’s flat, you can’t go back.

-Judah

[quote=“Panajev2001a”]

Which would mean that he would have to write again a partial exporter.

Am I right ?

I fear that he would be not the only developers that would want to do that and then decide they do not need COLLADA.

Making everyone happy is not possible, but an industry stabdard should aim to please at least a large chunk of developers to assure acceptance.[/quote]

This has nothing to do with COLLADA specification. As I said, it is already possible today to reference external binary data with COLLADA.

You are right that this option would have to be added to the exporter, and if it is not supported by the exporter provided by the DCC tool you are using you would have to do it yourself. The alternative would be to contact them to ask for this feature to be supported.

But in regard to this specific feature, exporting binary representations of floating point numbers would require that the same coding standard would need to be used on every platform it needs to be loaded back. It may be true for all the platforms you are using, but this is not true in the general case. This means that one would have to provide a conversion when loading the data.
For example floating point data used in shaders, or in smaller devices like mobile phones 3D graphics, are not IEEE compatible.

The collada spec already explicitly states that collada is not a game engine format. It’s a middle format targeted at tools that will then generate game ready formats. Therefore having any of the data in it be binary specific is not a problem since the tools themselves can handle the conversion and they can handle it much faster than they can handle parsing 2-5 times the data in ascii floats.

The external binary format idea in the collada spec is not even an option and is illconceived as if it’s not in a format that the collada spec specifies then a collada complient reader would not be able to read the file.

If you want to add to the spec the format of those external files that would solve the problem since then any collada reader could read any file with external references but in that case, all i’m suggesting is that instead of the file being external it be allowed to be internal as CDATA.

Right, so we needn’t worry about the data sizes of a specific platform at this point. For now, a float is a float.

If I made it sound as though the binary specification should not be in the Collada spec, then that was a mistake. The whole idea here is to come up with something that is universal, I think that’s why we’re all participating. If it is possible to embed the data cleanly, then so be it, but it seems like there are some issues with this data being represented properly within the xml file. Personally, I’d rather not have large blocks of binary data in these files, but that is my preference.

Okay, I’m coming into this a little late, but here’s my take on it:

  • I’d keep all scene data in a single file. Tif files can still be external because they’re external in the source data as well. Trying to keep a bunch of binary files together with the .dae file seems ugly.
  • An optional faster, more compressed format for large data would be nice for all the reasons greggman iterates.
  • But CDATA seems like a bad idea for reasons also already covered – http://lists.xml.org/archives/xml-dev/1 … 00388.html
  • I think it’s also legitimate to live within the limitations of an existing library like libxml2; I poked around a bit in the source and couldn’t figure out if it had native base64 support.

Why does libxml2 link with zlib? Does it support reading gzipped xml files directly? If that’s the case, perhaps that would be a good middle ground?

-Dave

I agree that storing raw binary data directly inside a xml file is not a solution. I also see the point in trying to work inside the xml constraint’s and not inventing a new format.

I agree with danny that a 3-4 fold performance increase can be the difference between nice to use and unbearable eg. waiting 5 or 20 seconds for a file to load.

To solve the issue at hand you could just base 64 the binary data and put that data into the CDATA tag. The file size would increase by 33%, base 64 decoding/encoding can be quite fast and it is definately a lot faster than reading ascii float values.

I see that this solution is ultimately not optimal and the only way to really solve it is by not using XML.

I don’t think embedding binary data in external files is a good solution either since you are just putting work from the programmer to the artist who is using collada and now has to keep 2 files around instead of one. And what happens if he wants to rename a file? It just doesn’t cut it.

For the time being Base64’ing vertex and index data seems like a good solution to me.

I am sure i am not telling you (gabor) anything new here, so what do you think is wrong with base 64ing the binary data.

It’s not inherently wrong, but there are two issues:

  • as Remi mentioned, storing floats in a native binary (or base64 encoded binary) form breaks interchangeability, because not all platforms have IEEE compliant floats
  • I’m still not convinced that it would be significantly faster than our current decimal-ASCII <-> float converters, but we should do some tests to determine that (see earlier message)

I thought collada is a interchange/intermediate format.

Consoles/Mobile games might not have IEEE compliant floats but since collada is an intermediate/interchange format it will not be used to load meshes directly into the game which might be running on a platform without IEEE compliant floats.
Rather the format will be used in a conversion process, which converts collada files to the native format of that game/platform. Thats my assumption anyway, is anyone planning to use collada as the final format in their game which will be shipped with the game to user?
If someone does it anyway, it’s not very hard to conver to non-IEEE floats anyway.

It sounds like a very good idea to do some speed tests of ASCII vs binary. Please post the results.

I think external referencing is the best approach for using binary data with XML. The CDATA tag doesn’t live up to expectations and base 64 encoding just makes things 33% bigger.

Just as with texture image data a URL to the file is preferred over embedding in COLLADA documents. I envision that large chunks of binary data can be in well known formats just as image data is stored in .png, .tif, .jpg, .bmp and so forth. None of those binary formats are within the scope of the COLLADA specification. I think everyone understands that that is not necessary.

In COLLADA, a URL from an <accessor> element expects to parse an <array> element. The <array> element has the meta data to describe the block of data. If that URL refers to an external, binary file we can expect the file extension to indicate the format of the data, just as with imagery. If we want to read .png data then everyone needs a library of code for that. It’s the same thing with external data… if the data is in .xls format, for example, then we need those routines etc.

We can use the <array> schema definition to create a binary form of that storage. In this case then I agree that COLLADA should define this schema. I think that is what some of you have been saying while ignoring the more general well known binary format use-case that is prevalent with image data.

I totally agree about the external binary data issue. It should be left as external files.

There are two issues with this though:

<issues>
I am not aware of a simple binary format that could be used to specify <array> like data in a file. As such, I think it would be fairly important to specify a recommended binary format so we don’t make it much harder to deal with collada as an intermediate format by having varying different external files. So yeah, I agree with the assessment.

EDIT: After looking it over, you might want to add in the p type (primitive index list) for external bin. representation because that could also add up, though less so. It does seem to be similar to <array> except with no name, count implied, and type as int… so it would be a very simple extension in the binary format.

Also, the only issue I can think of with having separate files to reference binary data from is that you’d end up with lots of little extra files all over the place. If you could specify a block format for storing a bunch of different <array> elements, then that would leave you with 1 extra file (or as few as you want…). One possible solution to this is to have an archive that contains all the necessary info be what is imported or exported. Although, with only 1 extra file, it doesn’t seem terribly necessary to have to archive them. Also, that could break things like source control if you wanted to use that for content files (though the binary data would break anyways…).
</issues>

Cool stuff though.

Adruab

I disagree with external referencing being a good approach for using binary data with XML.
It puts work from the programmer to the artist.
The mesh data of a scene is stored inside the same file as the scene by default (In any 3d package i am aware of).
This is done because it is nice to have single files which you can just move around as one coherent unit. This is not robustly possible with multiple files.

Collada shouldn’t change how artists normally work. Textures are stored externally, scenes including all mesh data are stored in a single file. This is how every 3D Package i am aware of handles it, so it should be how Collada handles it.

Reasons why multiple files are bad:

  • Robustness.
  1. What happens if the user wants to move the collada file and forgets to move the binary data as well. (I can assure you every artist will do that at least once)
  2. What happens if i check in the binary data into version control but not the scene data.
  • Clutter. Having 2 files instead of one, means browsing folders takes twice as long for the eye.

Collada is an interchange format so those issues actually matter. It wouldn’t matter that much if it was just an intermediate format to import 3d data into a game engine.

If you want the full speed of binary data use a format which supports embedding binary data. xml does not support that so if you are interested in the speed improvement the only solution that is not a pain for the user is to make the collada files non-xml.

One way to do that is instead of embedding binary data in the xml file, Embedding xml data in a binary format.

For example a very simple binary format would:
store the number of bytes the XML data has.
Store the xml data.
Store a lookup table so the xml data can index into the binary data.
Store all the binary data raw.

Specifying such a binary format which embeds the xml file is not any harder than using external referencing to binary files.

Kind of an aside: I just got a new phone (K700i) and was downloading some themes for it. Out of curiosity I opened a theme file up in a text editor, and lo and behold it seemed to be some kind of combined binary/xml format. Anyone know the details?

Ok, well what do you think of the possibility of a dae file being a gzipped archive with all the relevant contained files within? This is exactly what FX Composer does, and it seems to work well for that.

It wouldn’t complicate importers/exporters that much since you could just include zlib in the distribution.

As for a non-xml format… I think it would be a bad idea considering the entire spec. is currently built off xml. The farthest I would go is the block region of the file for xml and a block for binary as you mentioned. Even then, that’s pretty much the same as archiving it…

Adruab

I encourage you all to think less about files and their limitations in large projects. Consider instead COLLADA integration with a database system like Oracle, MS SQL, MySQL, or PostGreSQL to name a few.

COLLADA is an XML schema for (database) transactions as much as it is a “file format”.

Hmmm… Interesting. Obviously, many (all?) of the elements can be referenced externally. Does your comment imply that there will be an implementation of Collada that works by exporting/importing data directly to/from a database? It’s certainly true that that could eliminate the extra memory overhead required by the string representation. If it doesn’t export directly, however, you’d still have the file size problem for the xml representation.

I’m still not sure that make a lot of sense though… I suppose the xml schema specification could be translated to a database system easily enough. That leads to 2 questions:

Was having a database interface for collada information part of the design to begin with? And if so, will a default table layout/interface be specified to keep things consistent?

So did you do some ascii vs binary speed tests? What are the results?

Joachim Ante

Yes that is a milestone we are planning to achieve next year.

Several database tool sets automate this process more or less, e.g. Altova’s XMLSPY, and are getting better at importing schema’s (something they are weak at now). Related to this and interesting as well is XDB (XML Database) technology that is where large scale business data is heading.

Yes as COLLADA is (also) a research project, SCEA R&D has been exploring database generation and tool integration for several months.

I think there will be defaults like SQL, with XPath and XQuery and such as emerging standards for access and queries. Is COLLADA data-centric or document-centric? At this point I think it is data-centric but will it remains so? Time will tell. Database systems are evolving and COLLADA is exploring their application in the digital media markets.

I under stand the sentiment, but the premise of COLLADA is that the existing art pipeline is an expensive problem for most game developers. Developing games and movies with gigabytes and terabytes of data is a real expensive and growing concern. Companies that are experiencing this first hand do not use files as primary storage even now. They use Oracle or similar. 3D packages need to catch up to the demands of their customers for data storage and asset management too. It’s a challenge for everyone in our business as the rising quality expectations require us to store (and process) incredible amounts of data.