My input data is N rows X M columns matrix. Each cell is a float number.
The first stage:
Subtract each row from its previous one.
The output data is (N-1) rows X M columns.
For the subtraction, I think (not sure) I have to keep the input matrix and put the output in a new matrix.
FFT on each row. The output is (N-1) rows X M columns.
For the FFT process, the work item is a butterfly. for M items in a row I have M/4 butterflies.
Is it possible to do the 2 operations without coming back to the host after the first stage ?