I am using Opencl to accelerate updateWeight function (for neural network) on FPGA. I use the follwing kernel
__kernel void layerUpdateWeights_kernel(__global const double* inputs,
__global const double* delta,
__global double* weightsMat,
const int inputsSize,
const double learningRate ) {
int k = get_global_id(0);
int j = get_global_id(1);
weightsMat[k * (inputsSize + 1) + j] += learningRate * delta[k] * inputs[j];
}
and this what I wrote in the host code :
cl::NDRange globalSize(nbNeuronal, inputsSize+1);
err = q.enqueueNDRangeKernel(updateWeightsKernel, cl::NullRange, globalSize, cl::NullRange);
after implementation on FPGA I got a correct results and I determined the resources utilization using vivado tools.
I know that this code will create nbNeuronal*(inputSize+1) workItems on the FPGA. So, the resources on the FPGA will depend on this two values.
But, when I apply this code with for example nbNeuronal =10 and nbNeuronal = 200, I find the same resources utilization on FPGA (LUT, DSP, BRAM, ….).
any help please