That is correct. The global size is the total number of threads you want, 256 in this case. Setting localSize to 16 will split these 256 threads to 256/16 = 16 groups.
Note that the both of these sizes may be several values, one for each dimension of the NDRange. So setting the global size to 256 and work_dim to 1 gives you a consecutive range of thread ids from 0 up to, but not including, 256. Your question included a 16*16, which hints at a two-dimensional problem. If that is the case, then you may set the global size to [16,16] and work_dim to 2, which will spawn 256 threads in a 16-by-16 grid.