NV_vertex_program3 and textures

I have been writing on a water renderer using fragment_program and vertex_program and want to displace the water vertices using a texture. The problem is that running a vertex program accessing textures on 4 vertices results in a fps drop from 90 to about 0.5 .

The nvidia test program for this feature works although it seems quite frail (changing the texture type from GL_RGBA_FLOAT32_ATI to GL_RGBA16F_ARB hangs it, although changing this doesnt help for my program). So does anyone know what sort of limitations there are on using this extension?

Im running on a gf6600gt forceware 81.98

Vertex Program used

OPTION NV_vertex_program3;

PARAM mat[4] = { state.matrix.mvp };
OUTPUT opos = result.position;
TEMP temp,pos;

MOV temp,{0,0,0,0};
MUL temp.x, vertex.position.x, program.env[10].x;
MUL temp.y, vertex.position.z, program.env[10].x;
TEX temp, temp, texture[2], 2D;
MOV pos, vertex.position;
ADD pos.y, pos.y, temp.x;

DP4 opos.x, pos, mat[0];
DP4 opos.y, pos, mat[1];
DP4 opos.z, pos, mat[2];
DP4 opos.w, pos, mat[3];

MOV result.texcoord, pos;
MOV result.color, vertex.color;

Just passes through color in fragment program.

The only types of textures you can read in the vertex program are 32-bit float with one or four components. Each texture read takes about the same time as 20 ordinary instructions.

IIRC Only GL_LUMINANCE_FLOAT32_ATI and GL_RGBA_FLOAT32_ATI are accelerated with this format. You may find that you app hasn’t hung but falled back to sw mode.