Hello,

i am trying to submit a set of 4x4 matrices to an OpenCL kernel. I tried two ways:

- float16
- struct { float4 colums[4]; }

I load the matrices into a OpenCL buffer from the host side:

```
#ifndef __OPENCL_VERSION__
#include <core/math.h>
typedef math::vec4f float4;
typedef math::mat4f float4x4;
#else
#if 1
struct mat4f
{
float4 col[4];
};
typedef struct mat4f float4x4;
#else
typedef float16 float4x4;
#endif
#endif
struct volume_uniform_data
{
float4 _volume_extends; // w unused
float4 _scale_obj_to_tex; // w unused
float4 _sampling_distance; // yzw unused
float4 _os_camera_position;
float4 _value_range;
float4x4 _m_matrix;
float4x4 _m_matrix_inverse;
float4x4 _m_matrix_inverse_transpose;
float4x4 _mv_matrix;
float4x4 _mv_matrix_inverse;
float4x4 _mv_matrix_inverse_transpose;
float4x4 _mvp_matrix;
float4x4 _mvp_matrix_inverse;
}; // struct volume_uniform_data
```

The kernel interface looks like follows:

```
__kernel
void
main_vrc(__write_only image2d_t output_image,
__read_only image3d_t volume_image,
__read_only image2d_t colormap_image,
__constant struct volume_uniform_data* volume_data)
{
[...]
}
```

The problem i am facing is that the first version using a float16 for the matrices fails as the float16 variable contains wrong data. The struct on the other hand works perfectly.

Why is that? As i understand everything should be 16byte aligned in the volume_uniform_data struct, which it should be using both solutions.

I am trying this on Nvidia GeForce 480/580 hardware using r285 drivers.