Task shader thread is no executing according to its work group size

I wrote a very simple shader program with a task and mesh shader.
I defined the size of the task work group by:

layout(local_size_x = 32) in;

I launch the task shader with N number of work groups by calling:

glDrawMeshTasksNV(0, N);

I pass the global invocation of the task shader by defining:

taskNV out Task
{
    uint instanced_id;
} OUT;

with adding this line in the ‘main’ in the mesh shader:

OUT.instanced_id = gl_GlobalInvocationID.x;

and reading this value in the mesh shader:

int instance = int(IN.instanced_id);

In the mesh shader I render a square according to this value in, a different location for each of the value of instance.

What I get is only N squares, and not (local_size_x - 1) * N.

It seams that the task shader executes only N times, the local thread Id equals 0 and is not changing.
I would expect that is will executes (local_size_x - 1) * N.

To check myself, I called the task shader with

glDrawMeshTasksNV(0, (local_size_x - 1) * N))

, with the task work group set to 1, and I get (local_size_x - 1) * N) squares as expected.

I read the GLSL mesh shader specs, ask BING, and it seems that my assumptions are correct.
I also changed the driver as one of the BING recommendation.

Am I missing something?
Many thanks