WASI-NN should not apply input quantization

Currently, the TFLite wasi-nn implementation performs quantization if quantization scale and zero-point exist (https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/iwasm/libraries/wasi-nn/src/wasi_nn_tensorflowlite.cpp#L323)

This results in poor performance with `ssd_mobilenet_v1_1_metadata_1.tflite` [Direct download link](https://tfhub.dev/tensorflow/lite-model/ssd_mobilenet_v1/1/metadata/1?lite-format=tflite).

---

The SSD mobilenet v1.1 model has the following input details:

```python
import numpy as np
import tensorflow as tf
i = tf.lite.Interpreter(model_path="ssd_mobilenet_v1_1_metadata_1.tflite")
i.allocate_tensors()
input_details = i.get_input_details()[0]
input_details
```

```
{'name': 'normalized_input_image_tensor',
 'index': 175,
 'shape': array([  1, 300, 300,   3], dtype=int32),
 'shape_signature': array([  1, 300, 300,   3], dtype=int32),
 'dtype': numpy.uint8,               <--------------------------------------------------------
 'quantization': (0.0078125, 128),
 'quantization_parameters': {'scales': array([0.0078125], dtype=float32),
  'zero_points': array([128], dtype=int32),
  'quantized_dimension': 0},
 'sparsity_parameters': {}}
```

The model works well _without_ the RGB input (300x300x3 uint8_t) being quantized. (See my bug at https://github.com/joonb14/TFLiteDetection/issues/1 for a full Jupyter Notebook example.) When I try to apply quantization (in either python or by running the input through wasi-nn) I get very poor results.

To work-around this issue, I had to apply an inverse function when creating the input tensor:

```C
// Taken from the model's input_details:
#define QUANTIZATION_SCALE 0.007812
#define QUANTIZATION_ZERO_POINT 128.0

// in create_input(...)
    
    for (int i = 0; i < input.elements; ++i)
    {
        input.input_tensor[i] = data[i];
        // WAMR / wasi-nn bug. Model does not expect quantized data but it is done internally regardless:
        // Reversing the internal WAMR quantization:      it[i] = (uint8_t)(input_tensor_f[i] / scale + zero_point);
        input.input_tensor[i] = (input.input_tensor[i] - QUANTIZATION_ZERO_POINT) * QUANTIZATION_SCALE;
    }

    return input;
}
```

With above workaround, I get the exact same (good) results in both Python and when running with `iwasm` (wasi-nn enabled).

I'm confused by https://www.tensorflow.org/lite/performance/post_training_integer_quant#run_the_tensorflow_lite_models which states that if `input_details['dtype'] == np.uint8:` quantization should be applied to the input (what wasi-nn does)...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WASI-NN should not apply input quantization #2611

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

WASI-NN should not apply input quantization #2611

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions