-
Notifications
You must be signed in to change notification settings - Fork 741
Description
Currently, the TFLite wasi-nn implementation performs quantization if quantization scale and zero-point exist (https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/iwasm/libraries/wasi-nn/src/wasi_nn_tensorflowlite.cpp#L323)
This results in poor performance with ssd_mobilenet_v1_1_metadata_1.tflite Direct download link.
The SSD mobilenet v1.1 model has the following input details:
import numpy as np
import tensorflow as tf
i = tf.lite.Interpreter(model_path="ssd_mobilenet_v1_1_metadata_1.tflite")
i.allocate_tensors()
input_details = i.get_input_details()[0]
input_details{'name': 'normalized_input_image_tensor',
'index': 175,
'shape': array([ 1, 300, 300, 3], dtype=int32),
'shape_signature': array([ 1, 300, 300, 3], dtype=int32),
'dtype': numpy.uint8, <--------------------------------------------------------
'quantization': (0.0078125, 128),
'quantization_parameters': {'scales': array([0.0078125], dtype=float32),
'zero_points': array([128], dtype=int32),
'quantized_dimension': 0},
'sparsity_parameters': {}}
The model works well without the RGB input (300x300x3 uint8_t) being quantized. (See my bug at joonb14/TFLiteDetection#1 for a full Jupyter Notebook example.) When I try to apply quantization (in either python or by running the input through wasi-nn) I get very poor results.
To work-around this issue, I had to apply an inverse function when creating the input tensor:
// Taken from the model's input_details:
#define QUANTIZATION_SCALE 0.007812
#define QUANTIZATION_ZERO_POINT 128.0
// in create_input(...)
for (int i = 0; i < input.elements; ++i)
{
input.input_tensor[i] = data[i];
// WAMR / wasi-nn bug. Model does not expect quantized data but it is done internally regardless:
// Reversing the internal WAMR quantization: it[i] = (uint8_t)(input_tensor_f[i] / scale + zero_point);
input.input_tensor[i] = (input.input_tensor[i] - QUANTIZATION_ZERO_POINT) * QUANTIZATION_SCALE;
}
return input;
}With above workaround, I get the exact same (good) results in both Python and when running with iwasm (wasi-nn enabled).
I'm confused by https://www.tensorflow.org/lite/performance/post_training_integer_quant#run_the_tensorflow_lite_models which states that if input_details['dtype'] == np.uint8: quantization should be applied to the input (what wasi-nn does)...