Tessellation shader without patches

Hi everyone!

A long time has passed since I programmed using OpenGL and posted to this forum (actually it was opengl.org then), so long that I had to open a new account. I’m sorry for this introduction, but I need to justify why newbie posts to the advanced coding section. :slight_smile:

Well, I would like to use tessellation shaders for something I have already implemented in the vertex shader (VS), trying to compare the performances. The experiment will continue with the mesh shaders later.

The first question is: What is the most efficient way to invoke TCS without having to define a VBO with patch vertices?

Motivation: I actually do not need control vertices. It is common knowledge that TCS executes once for each vertex in the output patch. So, theoretically, I could use a dummy VBO with just one vertex for all patches. Furthermore, the vertex could have only one coordinate. Just to provoke a TCS execution. If it is so, could there be a way to avoid even that dummy VBO with a single floating-point value? VS can execute without attributes. Is it possible somehow for the TS?

The second question is performance-oriented: Is it better to use trivial TCS or set default tessellation levels from the application (and the number of patch vertices to 1) and remove TCS from the pipeline?

Thank you very much in advance!

Just don’t enable any attribute arrays. You still need to define the number of vertices per patch and draw with GL_PATCHES; the vertex shader just won’t have any input variables.

In case you haven’t figured it out, TCS output “vertices” aren’t actually vertices in any meaningful sense of the word. They’re just a mechanism to split the work of the TCS across multiple invocations. Each TCS invocation gets access to the data for all vertices for the patch, and can write some subset of the data used by the TES: any per-patch outputs, plus the per-“vertex” (i.e. per-invocation) outputs for that invocation.

If the computation performed by the TCS is parallelisable, there may be some performance gain from using multiple invocations. E.g. if the input is a 4x4 array of control points and the output is a 4x4 array of coefficients for a bicubic polynomial, you might have each invocation calculate one coefficient.

If the tessellation levels are fixed and you don’t need to process the patch inputs, there’s no need for a TCS. The primary purpose of the TCS is to set the tessellation levels dynamically based upon the inputs (i.e. more subdivisions the closer the patch is to the viewpoint). A secondary purpose is to pre-process the data for the patch into a form which minimises the amount of work which needs to be done for each vertex by the TES.

Thank you very much GClements! Fortunately, I’ve got the support in the project for TS, more than 9 years old, which I’ve totally forgotten about. So, the implementation was quite fast and easy.
Interestingly, TS-based implementation is about 10% faster than VS-based one with the same grid size (on NV GTX 850M). Furthermore, there is no index buffer required to connect vertices if VS is used. However, a potential disadvantage is the limit of the maximum tessellation levels. VS implementation with bigger blocks has better performance than TS one. For example, VS 77x77 blocks (and there is no limitation in their size) have 3% better performance than TS 65x65 blocks. The scene consists of about 6.5M triangles.
It would be interesting to see how the mesh shaders will perform. The maximum meshlet size is extremely small (current HW limitation is 256 vert. and 512 primitives, while C.Kubisch recommends only 64 vert. and 126 primitives in the NV developer blog). So, I guess, there will be a lot of CPU culling workload or a need to transfer it somehow to the GPU.