1+ %% Convert librosa Audio Feature Extraction To MATLAB
2+ %% This Example Shows How to:
3+ %%
4+ % * Convert librosa Python feature extraction code to MATLAB.
5+ % * Using the MATLAB feature extraction code, translate a Python speech command
6+ % recognition system to a MATLAB system where Python is not required.
7+ %% Overview
8+ % Audio and speech AI systems often include feature extraction. Importing AI
9+ % audio models trained in non-MATLAB frameworks into MATLAB usually consists of
10+ % two steps:
11+ %%
12+ % * Import the pretrained network to MATLAB.
13+ % * Translate the feature extraction performed in the non-MATLAB framework to
14+ % MATLAB code.
15+ %%
16+ % This example focuses on the second step of this process. In particular, you
17+ % learn how to translate librosa feature extraction functions to their MATLAB
18+ % equivalents.
19+ %
20+ % The example covers three of the most popular audio feature extraction algorithms:
21+ %%
22+ % * Short-time Fourier transform (STFT) and its inverse (ISTFT).
23+ % * Mel spectrogram.
24+ % * Mel-frequency cepstral coefficients (MFCC).
25+ %%
26+ % You also leverage the converted feature extraction code to translate a Python
27+ % deep learning speech command recognition system to MATLAB. The Python system
28+ % uses PyTorch for the pretrained network, and librosa for mel spectrogram feature
29+ % extraction.
30+ %% Requirements
31+ %%
32+ % * <https://www.mathworks.com/ MATLAB®> R2021b or later
33+ % * <https://www.mathworks.com/products/audio.html Audio Toolbox™>
34+ % * <https://www.mathworks.com/products/deep-learning.html Deep Learning Toolbox™>
35+ %%
36+ % The Python code uses the following packages:
37+ %%
38+ % * librosa version 0.9.2
39+ % * PyTorch version 1.10.2
40+ %% Mapping librosa Code to MATLAB Code
41+ % STFT and ISTFT
42+ % Execute librosa Code
43+ % You start with translating STFT and ISTFT librosa code to MATLAB.
44+ %
45+ % The Python script <./PythonCode\librosastft.py |librosaSTFT.py|> uses the
46+ % librosa functions |stft| and |istft|.
47+ %
48+ % Inspect the contents of the script.
49+
50+ addpath(" PythonCode\" )
51+ pythonScript = fullfile(pwd ," PythonCode" ," librosastft.py" );
52+ type(pythonScript )
53+ %%
54+ % You can execute Python scripts and commands from MATLAB. For more information
55+ % about this functionality, see <https://www.mathworks.com/help/matlab/call-python-libraries.html
56+ % Call Python from MATLAB> in the documentation. In this example, you use <https://www.mathworks.com/help/matlab/ref/pyrunfile.html
57+ % pyrunfile> to run a Python script in MATLAB.
58+ %
59+ % Use pyrunfile to call the Python script. Pass the name of the test audio file
60+ % as an input argument. Return variables computed in the Python script to MATLAB
61+ % by specifying them as output arguments.
62+
63+ filename = fullfile(pwd ," samples" ," yes.flac" );
64+ [stftOut1 , istftOut1 ] = pyrunfile(pythonScript ,[" stftOut" ," istftOut" ],filename= filename );
65+ stftOut1 = single(stftOut1 );
66+ istftOut1 = single(istftOut1 );
67+ % Implement Equivalent MATLAB Code
68+ % To perform the equivalent STFT and ISTFT computations in MATLAB, you use the
69+ % MATLAB functions <./HelperFiles\+librosa\stft.m librosa.stft> and <./HelperFiles\+librosa\istft.m
70+ % librosa.istft>. The name-value arguments of these functions match the name-value
71+ % arguments of their librosa counterparts.
72+ %
73+ % Load the sample audio signal in MATLAB.
74+
75+ addpath(fullfile(pwd ," HelperFiles" ))
76+ [samples ,fs ] = audioread(filename );
77+ samples = single(samples );
78+ %%
79+ % Now compute the STFT. Use the same name-value arguments as in the Python script.
80+
81+ stftOut2 = librosa .stft(samples ,FFTLength= 512 ,HopLength= 160 ,...
82+ WindowLength= 512 ,Window= " hann" ,...
83+ Center= true );
84+ %%
85+ % Compare the Python and MATLAB STFT values by computing the error.
86+
87+ fprintf(" STFT error: %f\n" , norm(stftOut1(: )-stftOut2(: )));
88+ %%
89+ % Note that calling librosa.stft with no output arguments plots the magnitude
90+ % of the STFT.
91+
92+ figure ;
93+ librosa .stft(samples ,FFTLength= 512 ,HopLength= 160 ,...
94+ WindowLength= 512 ,Window= " hann" ,...
95+ Center= false );
96+ %%
97+ % Now compute the ISTFT in MATLAB by using the same name-value arguments as
98+ % the Python script.
99+
100+ istftOut2 = librosa .istft(stftOut2 ,FFTLength= 512 ,HopLength= 160 ,...
101+ WindowLength= 512 ,Window= " hann" ,...
102+ Center= true );
103+ %%
104+ % Compare the MATLAB and librosa ISTFT values.
105+
106+ figure
107+ subplot(2 ,1 ,1 )
108+ L = length(istftOut1 );
109+ t = (0 : L - 1 )/fs ;
110+ plot(t ,istftOut1 )
111+ grid on
112+ xlabel(" Time (s)" )
113+ title(" librosa" )
114+ subplot(2 ,1 ,2 )
115+ t = (0 : L - 1 )/fs ;
116+ plot(t ,istftOut2 )
117+ grid on
118+ xlabel(" Time (s)" )
119+ title(" MATLAB" )
120+ %%
121+ % Compute the error.
122+
123+ fprintf(" ISTFT error: %f\n" , norm(istftOut1(: )-istftOut2(: )));
124+ % Generate MATLAB Code from librosa.stft and librosa.istft
125+ % To generate MATLAB code that implements librosa's STFT with documented MATLAB
126+ % function, specify |GenerateMATLABCode=true| in the call to |librosa.stft.| In
127+ % this case, the generated MATLAB code uses the function <https://www.mathworks.com/help/signal/ref/stft.html
128+ % stft>.
129+
130+ out = librosa .stft(samples ,FFTLength= 512 ,HopLength= 160 ,...
131+ WindowLength= 512 ,Window= " hann" ,...
132+ Center= true ,GenerateMATLABCode= true );
133+ % Mel Filter Bank
134+ % Next, you map librosa's mel filter bank function to MATLAB. Mel filter banks
135+ % are integral to mel spectrograms and MFCC computations.
136+ %
137+ % Inspect the Python script that builds the filter bank.
138+
139+ pythonScript = fullfile(pwd ," PythonCode" ," librosamel.py" );
140+ type(pythonScript )
141+ %%
142+ % Execute the script.
143+
144+ melOut1 = pyrunfile(pythonScript ," melOut" );
145+ melOut1 = single(melOut1 );
146+ %%
147+ % Use <./HelperFiles\+librosa\mel.m librosa.mel> to construct the same filter
148+ % bank in MATLAB.
149+
150+ melOut2 = librosa .mel(SampleRate = fs ,FFTLength= 512 ,NumBands= 50 ,...
151+ Normalization= " Slaney" ,HTK= true );
152+ %%
153+ % Plot and compare the librosa and MATLAB filter banks.
154+
155+ Fc = mel2hz(linspace(0 ,fs / 2 ,50 ));
156+ figure ;
157+ subplot(2 ,1 ,1 )
158+ plot(melOut1 .' )
159+ grid on
160+ title(" librosa Mel Filter Bank" )
161+ xlabel(" Frequency Bin #" )
162+ subplot(2 ,1 ,2 )
163+ plot(melOut2 .' )
164+ grid on
165+ title(" MATLAB Mel Filter Bank" )
166+ xlabel(" Frequency Bin #" )
167+ %%
168+ % Compute the error.
169+
170+ fprintf(" Mel filter bank error: %f\n" , norm(melOut1(: )-melOut2(: )))
171+ %%
172+ % Similar to |librosa.stft| and |librosa.istft|, specify |GenerateMATLABCode=true|
173+ % to generate MATLAB code that uses documented functions. In this case, the generated
174+ % code uses <https://www.mathworks.com/help/audio/ref/designauditoryfilterbank.html
175+ % designAuditoryFilterBank>.
176+
177+ librosa .mel(SampleRate = fs ,FFTLength= 512 ,NumBands= 50 ,...
178+ Normalization= " Slaney" ,HTK= true ,...
179+ GenerateMATLABCode= true );
180+ % Mel Spectrogram
181+ % Next, you map librosa's mel spectrogram function to MATLAB.
182+ %
183+ % Inspect the Python script that computes the mel spectrogram.
184+
185+ pythonScript = fullfile(pwd ," PythonCode" ," librosamelspectrogram.py" );
186+ type(pythonScript )
187+ %%
188+ % Execute the script.
189+
190+ melSpectrogramOut1 = pyrunfile(pythonScript ," melSpectrogramOut" ,filename= filename );
191+ melSpectrogramOut1 = single(melSpectrogramOut1 );
192+ %%
193+ % Use <./HelperFiles\+librosa\melSpectrogram.m librosa.melSpectrogram> to compute
194+ % the same mel spectrogram in MATLAB.
195+
196+ melSpectrogramOut2 = librosa .melSpectrogram(samples ,SampleRate= fs ,FFTLength= 512 ,NumBands= 50 ,...
197+ Center= false ,HopLength= 160 ,WindowLength= 512 ,Window= " hann" ,...
198+ Normalization= " Slaney" , HTK= true , Power= 2 );
199+ %%
200+ % Compute the error.
201+
202+ fprintf(" Mel spectrogram error: %f\n" , norm(melSpectrogramOut1(: )-melSpectrogramOut2(: )))
203+ %%
204+ % Similar to other functions, specify |GenerateMATLABCode=true| to generate
205+ % MATLAB code that uses documented MATLAB functions. In this case, the generated
206+ % code uses <https://www.mathworks.com/help/signal/ref/stft.html stft> and <https://www.mathworks.com/help/audio/ref/designauditoryfilterbank.html
207+ % designAuditoryFilterBank>.
208+
209+ librosa .melSpectrogram(samples ,SampleRate= fs ,FFTLength= 512 ,NumBands= 50 ,...
210+ Center= false ,HopLength= 160 ,WindowLength= 512 ,Window= " hann" ,...
211+ Normalization= " Slaney" , HTK= true , Power= 2 ,...
212+ GenerateMATLABCode= true );
213+ % MFCC
214+ % Finally, you map librosa's MFCC computation function to MATLAB.
215+ %
216+ % Inspect the Python script that computes MFCC.
217+
218+ pythonScript = fullfile(pwd ," PythonCode" ," librosamfcc.py" );
219+ type(pythonScript )
220+ %%
221+ % Execute the script.
222+
223+ mfccOut1 = pyrunfile(pythonScript ," mfccOut" ,filename= filename );
224+ mfccOut1 = single(mfccOut1 );
225+ %%
226+ % Use <./HelperFiles\+librosa\mfcc.m librosa.mfcc> to compute the same MFCC
227+ % in MATLAB.
228+
229+ mfccOut2 = librosa .mfcc(samples ,SampleRate= fs ,FFTLength= 512 ,NumBands= 50 ,FMin= 10 ,...
230+ HopLength= 160 ,WindowLength= 512 ,Window= " hann" ,...
231+ HTK= true ,Power= 2 ,DCTType= 2 ,Lifter= 0.2 );
232+ %%
233+ % Compute the error.
234+
235+ fprintf(" MFCC error: %f\n" , norm(mfccOut1(: )-mfccOut2(: )))
236+ %%
237+ % Similar to other functions, specify |GenerateMATLABCode=true| to generate
238+ % MATLAB code that uses documented functions. In this case, the generated code
239+ % uses <https://www.mathworks.com/help/signal/ref/stft.html stft>, <https://www.mathworks.com/help/signal/ref/dct.html
240+ % dct>, and <https://www.mathworks.com/help/audio/ref/designauditoryfilterbank.html
241+ % designAuditoryFilterBank>.
242+
243+ librosa .mfcc(samples ,SampleRate= fs ,FFTLength= 512 ,NumBands= 50 ,FMin= 10 ,...
244+ HopLength= 160 ,WindowLength= 512 ,Window= " hann" ,...
245+ HTK= true ,Power= 2 ,DCTType= 2 ,Lifter= 0.2 ,...
246+ GenerateMATLABCode= true );
247+ %% Import Python Speech Command System to MATLAB
248+ % You now use the feature extraction mapping functionality to translate a Python
249+ % pretrained speech recognition system to MATLAB.
250+ % System Description
251+ % The deep learning speech command recognition system was trained in Python.
252+ %
253+ % The system recognizes the following commands:
254+ %%
255+ % * "yes"
256+ % * "no"
257+ % * "up"
258+ % * "down"
259+ % * "left"
260+ % * "right"
261+ % * "on"
262+ % * "off"
263+ % * "stop"
264+ % * "go"
265+ %%
266+ % The system is comprised of a convolutional neural network. The network accepts
267+ % mel spectrograms as an input.
268+ %
269+ % For the training workflow, a supervized learning approach is followed, where
270+ % mel spectrograms labeled with commands are fed to the network.
271+ %
272+ %
273+ %
274+ % The following were used to train the command recognition system:
275+ %%
276+ % * *PyTorch* to design and train the model.
277+ % * librosa to perform feature extraction (auditory spectrogram computation).
278+ %%
279+ % You perform speech recognition in Python by first extracting an mel spectrogram
280+ % from an audio signal, and then feeding the spectrogram to the trained convolutional
281+ % network.
282+ %
283+ %
284+ %
285+ %
286+ % Perform Speech Command Recognition in Python
287+ % The Python script <./PythonCode\InferSpeechCommands.py |InferSpeechCommands.py|>
288+ % performs speech command recognition.
289+ %
290+ % Execute Python inference in MATLAB. The Python script prints out the recognized
291+ % keyword. Return the network activations.
292+
293+ cd(" PythonCode" )
294+ pythonScript = " InferSpeechCommands.py" ;
295+ [pytorchActivations ,mm ] = pyrunfile(pythonScript ,[" activations" ," z" ],filename= filename );
296+ cd ..
297+ % Convert the Pretrained Network to MATLAB
298+ % You first import the PyTorch pretrained network to MATLAB using MATLAB's <https://www.mathworks.com/help/deeplearning/deep-learning-import-and-export.html?s_tid=CRUX_lftnav
299+ % model import-export functionality>. In this example, you use <https://www.mathworks.com/help/deeplearning/ref/importonnxnetwork.html
300+ % importONNXNetwork>. The function imports a version of the network that was saved
301+ % to the Open Neural Network Exchange (ONNX) format. To see how the PyTorch model
302+ % can be saved to an ONNX format, refer to <./PythonCode\convertModelToONNX.py
303+ % convertModelToONNX.py>.
304+
305+ onnxFile = " cmdRecognitionPyTorch.onnx" ;
306+ %%
307+ % Import the network to MATLAB
308+
309+ net = importONNXNetwork(onnxFile )
310+ % Perform Speech Command Recognition in MATLAB
311+ % Use |librosa.melSpectrogram| to perform feature extraction. Call the function
312+ % with the same name-value arguments as the Python inference.
313+
314+ spect = librosa .melSpectrogram(samples ,SampleRate= fs , FFTLength= 512 ,NumBands= 50 ,...
315+ Center= false ,HopLength= 160 ,WindowLength= 512 ,Window= " hann" ,...
316+ Normalization= " Slaney" , HTK= true , Power= 2 );
317+ spect = log10(spect + 1e-6 );
318+ MATLABActivations = predict(net ,spect .' );
319+ %%
320+ % Compare MATLAB and PyTorch activations.
321+
322+ figure
323+ plot(MATLABActivations ," b*-" )
324+ hold on
325+ grid on
326+ plot(pytorchActivations ," ro-" )
327+ xlabel(" Activation #" )
328+ legend(" MATLAB" , " Python" )
329+ %%
330+ % Verify the spoken command in MATLAB.
331+
332+ CLASSES = [" unknown" " yes" " no" " up" " down" " left" " right" " on" " off" " stop" " go" ];
333+ [~ ,ind ] = max(MATLABActivations );
334+ fprintf(" Recognized command: %s\n" ,CLASSES(ind ))
335+ %%
336+ % _Copyright 2022 The MathWorks, Inc._
0 commit comments