Trying to make sense of a CL_DEVICE_MAX_WORK_GROUP_SIZE returned by particular card like vega56

The function clGetDeviceInfo() returned value of 256 for CL_DEVICE_MAX_WORK_GROUP_SIZE when executed by VEGA56 MSI card that I own. I am trying to make sense of it and referred to OPENCL programming guide. By definition, this return type is defined as:

  • Maximum number of work-items in a work-group executing a kernel using the data parallel execution model.

According to section titled " Key differences between pre-GCN and GCN devices" in opencl programming duie, the GCN devices which includes VEGA 10 chip VEGA 56 MSI card that my card is example of, each CU (compute unit) consists of 4 Vector SIMD (which in turn contains 16 processing elements) and one scalar SIMD.

Another comment made in section titled under " Work-Item Processing" says following:
"All processing elements within a vector unit execute the same instruction in each cycle. For a typical instruction, 16 processing elements execute one instruction for 64 work items over 4 cycles. The block of work-items that are executed together is called a wavefront. "

So i struggled to create following relation:

CL_DEVICE_MAX_WORK_GROUP_SIZE = wavefront * 4 VECTOR SIMD = 256 = 64 * 4 * 4.

Is my thinking correct?