Skip to content

WASI-NN should not apply input quantization #2611

@CIPop

Description

@CIPop

Currently, the TFLite wasi-nn implementation performs quantization if quantization scale and zero-point exist (https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/iwasm/libraries/wasi-nn/src/wasi_nn_tensorflowlite.cpp#L323)

This results in poor performance with ssd_mobilenet_v1_1_metadata_1.tflite Direct download link.


The SSD mobilenet v1.1 model has the following input details:

import numpy as np
import tensorflow as tf
i = tf.lite.Interpreter(model_path="ssd_mobilenet_v1_1_metadata_1.tflite")
i.allocate_tensors()
input_details = i.get_input_details()[0]
input_details
{'name': 'normalized_input_image_tensor',
 'index': 175,
 'shape': array([  1, 300, 300,   3], dtype=int32),
 'shape_signature': array([  1, 300, 300,   3], dtype=int32),
 'dtype': numpy.uint8,               <--------------------------------------------------------
 'quantization': (0.0078125, 128),
 'quantization_parameters': {'scales': array([0.0078125], dtype=float32),
  'zero_points': array([128], dtype=int32),
  'quantized_dimension': 0},
 'sparsity_parameters': {}}

The model works well without the RGB input (300x300x3 uint8_t) being quantized. (See my bug at joonb14/TFLiteDetection#1 for a full Jupyter Notebook example.) When I try to apply quantization (in either python or by running the input through wasi-nn) I get very poor results.

To work-around this issue, I had to apply an inverse function when creating the input tensor:

// Taken from the model's input_details:
#define QUANTIZATION_SCALE 0.007812
#define QUANTIZATION_ZERO_POINT 128.0

// in create_input(...)
    
    for (int i = 0; i < input.elements; ++i)
    {
        input.input_tensor[i] = data[i];
        // WAMR / wasi-nn bug. Model does not expect quantized data but it is done internally regardless:
        // Reversing the internal WAMR quantization:      it[i] = (uint8_t)(input_tensor_f[i] / scale + zero_point);
        input.input_tensor[i] = (input.input_tensor[i] - QUANTIZATION_ZERO_POINT) * QUANTIZATION_SCALE;
    }

    return input;
}

With above workaround, I get the exact same (good) results in both Python and when running with iwasm (wasi-nn enabled).

I'm confused by https://www.tensorflow.org/lite/performance/post_training_integer_quant#run_the_tensorflow_lite_models which states that if input_details['dtype'] == np.uint8: quantization should be applied to the input (what wasi-nn does)...

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions