How to use AlexNet example

I can’t wrap my head around your (very poor btw) example on alexnet, which is a part of OpenVX Sample implementation, downloaded from your github.

First of all, the code is absolute spaghetti, it’s very difficult to read and difficult to get involved with.

Second, your NN example in sample impl. is quite different to the one in your youtube channel and how it was used and showed there. For instance, in the video Tomer used the “-r” switch, which is not even parsed in the example in your repo.

Next, who on earth invented the magic constants in the main function?
if ((data[i] == NULL) || (avgSize / sizeof(int16_t) != 196608)) { …

My questions is about the input itself of the net - how do I input some image?
I’ve managed to convert the hummingbirdq78 file to opencv and show it, but why is the hummingbird 9 times in the image?
If I want to use my own image, say a collie or snake that is in the repo, how to do it? Convert to q78? Convert to 277x277? I’ve tried that, but no luck.
I’ve tried converting to gray scale using opencv, 227x227 CV_32FC3 format and then to q78, I’ve tried using cv::subtract to subtract
float meanValues[] = { 104.f, 117.f, 123.f };
found in CTS tests, but none works.

So how to use my own image in alexnet? My idea is to read it as opencv RGB 3channel, 1 byte per channel, convert to q78 on a 1D array level, load to tensor and run the net.

Let me attach a version of your main that I’ve modified:

int main() {
    int16_t *input = nullptr;
    int16_t *weights = nullptr;

    int16_t *output;
    std::string inputFileName = "/home/dev/openvx_sample/sample/cnn/resources/hummingbird_227x227.q78";
    std::string weightsFileName = "/home/dev/openvx_sample/sample/cnn/resources/alexnet_weights_q8_d442mw2";
    size_t inputSize = 0, weightSize = 0/*, avgSize = 0*/;

    output = (int16_t *)malloc(1000 * sizeof(int16_t));

    const auto q78ToFloatConvertFunction = [](int16_t $val) -> float {
        return (static_cast<float>($val)) / 256.0f;
    };

    const auto floatToQ78ConvertFunction = [](float $val) -> int16_t {
        float r = $val < 0.0f ? -0.5f : 0.5f;
        int tmpValue = static_cast<int>(($val * 256.0 + r));
        int16_t value = tmpValue > SHRT_MAX ? SHRT_MAX : (tmpValue < SHRT_MIN ? SHRT_MIN : (int16_t)tmpValue);
        return value;
    };

    cv::Mat q = cv::imread("/home/dev/openvx_sample/sample/cnn/resources/Hummingbird.png");
    q.convertTo(q, CV_32FC3);
    cv::subtract(q, cv::Scalar(104.f, 117.f, 123.f), q);

    const unsigned size = 227 * 227 * 3;
    input = new int16_t[size];

    float *fp = (float *)q.data;
    for (unsigned j = 0; j < size; j++) {
        input[j] = floatToQ78ConvertFunction(fp[j]);
    }

    weights = MapFile(weightsFileName.c_str(), &weightSize);
    unsigned int nParameters = weightSize / (sizeof(int16_t));
    if ((weights == NULL) || (nParameters != 60965224)) {
        printf("Invalid weights file\n");
        return -1;
    }

    if (Alexnet(input, weights, output) == 0) {
        return verifyResult("alexnet", VX_TYPE_INT16, output, 1000);
    }

    return 0;
}

The API of Alexnet() is also modifed to accept input and weights, cause using an array of pointers in a c++ code is just … wrong.

Thanks

Not sure if this would help, but there is another open source project which converts pre-trained models to OpenVX Graphs. The link to the project is MIVisionX | MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.

I created a sample using MIVisionX - GitHub - kiritigowda/mivisionx-inference-analyzer: MIVisionX Python Inference Analyzer uses pre-trained ONNX/NNEF/Caffe models to analyze inference results and summarize individual image results

1 Like