Topographic shader and better indexing

Well, one of my hobbies is to write mesh exporters for 3d programs like 3dsmax, maya, xsi, etc…
Both in OpenGL and DirectX, found a thing that desperates me… Imagine the following TEXTURED box:

As you can see, the cube has 8 vertex positions, but when you export it has 24 different vertices… why? simple… The vertices shares positions but not normals neither UVs ( notice the Hi texture applyed for each face )


For frontal face, v2 texture coords are (1,1), while for right face v2 texture coords are (0,1), so you need to DUPLICATE the vertex as:

pos=(1,1,0); UVs=(1,1)
pos=(1,1,0); UVs=(0,1)

This makes the mesh to occupy more in VRAM and also provokes the shared vertex positions to be transformed twice or more, which degrades performace…

This way to treat the vertices is due electronic microchips are very fast typically applying linear algorithms without pointers, branching, random access, etc… But now we can perform some of these things with the GPU.

For the next shader generation, this should change, so the triangles must be processed in a different way. Imagine… In one call you do

glPositionsArray(...) //Send all vertex positions
glNormalsArrau(...) //Send all normals to GPU
glTextureCoordsArray(...) //Send all UVs to GPU

   int idx; //Triangle index in the itself

   int vIndex[3]; //vertex indices referred to glPositionsArray
   int normalIndex[3]; //normal indices referred to glNormalArray
   int uvIndex[32][3]; //32 uv channels for multitexture

   int adjIndex[3]; //the 3 neighbor adjacent triangle indices of the triangles that surrounds this triangle. -1 if none.

glTriangleIndices(sINPUT_TRIANGLE*, int nTris);

   vec3 pos;
   vec2 uv[32];
   vec3 norm;
   vec4 customData[32]; //other used-defined extra data to be goraud/flat triangle interpolated

   sRASTER_PIXEL v[3];

Internally, the GPU does does:

foreach ( Mesh mesh in vram.MeshesToDraw )
   //Execute the user-defined topographic shader and draw resulting triangles
   foreach ( sRASTER_TRIANGLE tri in unifiedShader::TopographicShade(mesh.Triangles) )
      foreach ( sRASTER_PIXEL pix in renderer::drawTri(tri) )

The user can define the following custom shader:

//User defined shader constants ( call glSetShaderConstant(string name, void* value) to set them)
mat4x4 g_objinvViewProjTM, g_objTM;
vec3 camDirNeg; //Camera direction negated for face culling

const vec3 inputPositions[]; //The GPU will put here the data from glPositionsArray
const vec3 inputNormals[];   //The GPU will put here the data from glNormalsArray
const vec2 inputUVs[32][];   //The GPU will put here the data from glUVsArray

sampler2D textureSampler[32]; //32 multitextures allowed

sRASTER_TRIANGLE[] unifiedShader::TopographicShade(sINPUT_TRIANGLE[] tris )
   vec3[] triNormals;
   sRASTER_TRIANGLE[] finalRasterTriangles;
   vec3 vNormals[inputPositions.Count] = {0};
   vec3 v20, v10, triN;

   //Average normals using adjacency
   foreach ( Triangle t in tris )
      v20 = normalize ( inputPositions[t.vIndex[2]] - inputPositions[t.vIndex[0]] );
      v20 = normalize ( inputPositions[t.vIndex[1]] - inputPositions[t.vIndex[0]] );

      triN = normalize(cross(v2,v0));

      for ( int i=0; i<3; i++ )
         vNormals[t.normalIndex[i]] += triN;

   //Transform positions to clip space, normals to world space
   vec4 tPositions[inputPositions.Count];
   for ( int i=0; i<inputPositions.Count; i++ )
      tPositions[i] = mul(g_objinvViewProjTM,inputPositions[i]]);

   vec3 tNormals[vNormals.Count];
   for ( int i=0; i<vNormals.Count; i++ )
      tNormals[i] = nul(g_objTM,vNormals[i]);
   //Do back-face culling
   sRASTER_TRIANGLE[] finalTriangleList;

   foreach ( Triangle t in tris )
       if ( dot(triNormals[t.idx],camDirNeg) < 0.0 )
          //Back-face found, skip it

       //Output raster triangle
       for ( int i=0; i<3; i++ )
          l_sTri.v[i].pos = tPositions[tri.vPos[i]]);
          l_sTri.v[i].norm = tNormals[tri.normalIndex[i]]);
          l_sTri.v[i].uv[0] = inputUVs[0][tri.uvIndex[0][i]]);


   return finalTriangleList;      

vec4[6] unifiedShader::PixelShade ( sRASTER_PIXEL pix )
    Pixel shader like we use at the moment BUT infinite length allowed, REAL early-out/break/continue/return branching,
    constant/texture sample indexing allowed, etc... The unified shader grants us to use ANY instruction.. the GPU
    will be like a BIG SIMD calculator, allowing normalize(), sincos(), constant[index],
    for/do/while... with NO LIMIT at all( except pointers perhaps ).
    Notice we return a vec4[6] color array, allowing us to write cube-map faces too and letting
    us to specify a FLT_INFINITE value if we dont want to write anything to one MRT.

  vec[0] = texture2D(textureSampler,pix.uv[0]); //color to muñltiple render target 0

  vec[1] = vec[2] =  vec[3] =  vec[4] =  vec[5] = flt_infinite; //infinite=don't write


Also, will be GOOD to save in a VRAM cache the “raster triangles” that outputs the “Topographic shader” for multipass algorithns, so no need to re-transform again the vertices. For example:

   int cachedTransformedTrianglesHandler = glDrawArrays(....); //this "shades" the triangles, draws the mesh using indexed triangle list and "caches" the raster triangles into an internal VRAM buffer...

    glRedrawMultipassArrays(cachedTransformedTrianglesHandler); // this will re-use the previous drawn raster triangles skipping the need for re-transform all the vertices again...

glEraseCache(cachedTransformedTrianglesHandler);//free transformed triangles cache

Notice this way to store/process data is very efficient, customizable, skips the need to implement a post-vertex-cache and allows tons of thing we cannot do at the moment…
What do you think about all this?

Yes, it would be nice to have separately indexed normals / texcoords since many modelling packages store meshes in this format.

However the additional bandwidth required by the extra indices would probably negate any benefit from saving transformations.