I am doing some pretty extreme performance tests with 100 million+ polys, and I want to optimize the speed at which my vertex shaders are running, primarily for shadow rendering.
This is my vertex structure:
struct Vertex
{
Vec4 position;
std::array<signed char, 3> normal;
signed char displacement;
std::array<short, 4> texcoords;
std::array<signed char, 4> tangent;
std::array<unsigned char, 4> color;
std::array<unsigned char, 4> boneweights;
std::array<unsigned char, 4> boneindices;
uint32_t index;
}
On the shader side, vertices are defined as follows:
//Vertex layout
layout(location = 0) in vec4 inPosition;
layout(location = 1) in vec4 inNormal;
layout(location = 2) in vec4 inTexCoords;
layout(location = 3) in vec4 inTangent;
layout(location = 4) in vec4 inColor;
layout(location = 5) in vec4 inBoneWeights;
layout(location = 6) in uvec4 inBoneIndices;
layout(location = 7) in uint inVertexID;
Do you see any problems here that would be non-optimal for common PC hardware?
I know this was the case a few years ago, but can we expect an unsigned integer vertex indices to be slower on modern hardware than an unsigned short?
Currently I am using the same shader layout for render and shadow polygons. Is this a mistake? Can I make shadow polys faster by omitting everything but the vertex position, or by copying the vertex positions into a second tightly packed shadow mesh? Or is that a waste of time?
Any tips you can offer are appreciated.