[TTS] MagpieTTS: Implement Frechet Codec Distance metric + some minor inference bugfixes #15223

rfejgin · 2025-12-23T06:47:36Z

What does this PR do ?

Adds the Frechet Codec Distance metric and integrates it in MagpieTTS inference scripts. Also fixes some minor MagpieTTS inference bugs.

Collection: TTS

Changelog

The Frechet Distance (FD) is commonly used to evaluate generative models (e.g. Frechet Inception Distance, Frechet Audio Distance). In this PR we implements FD in the embedding space of a neural codec. This is a metric that measures how closely the distributions of real and generated codec frames match, at the single frame level.

Changes:

frechet_codec_distance.py: An implementation of FD in codec embedding space. Builds on TorchMetrics' FID implementation. We provide the audio codec as a custom feature extractor.
test_frechet_coec_distance.py: Unit test
Integration of the FCD in MagpieTTS inference scripts. If desired, FCD calculation can be disabled using the --disable_fcd command line argument to magpietts_inference.py
Inference bugfixes
- fix a logging statement that was reporting errors due to incorrect formatting syntax
- disable logging of thousands of messages during loading of the titanet_small speaker representation model. This was present in earlier versions of the inference scripts and appears to have been accidentally lost in recent refactorings
- Fix an issue where filewise metrics were not being filtered to a spcified subset as intended

PR Type:

New Feature
Bugfix
Documentation

nemo/collections/tts/modules/magpietts_inference/evaluate_generated_audio.py

Signed-off-by: Fejgin, Roy <[email protected]>

Instead of taking a codec instance, accept a codec name: local path or HF/NGC name. This simplifies the metric's integration in calling code. Signed-off-by: Fejgin, Roy <[email protected]>

Signed-off-by: Fejgin, Roy <[email protected]>

* address some CI linting issues * include a file that was missed in last commit Signed-off-by: Fejgin, Roy <[email protected]>

Signed-off-by: Fejgin, Roy <[email protected]>

blisc · 2025-12-30T21:25:25Z

nemo/collections/tts/metrics/frechet_codec_distance.py

+        # Consturct a length tensor: one batch element, all frames.
+        x_len = torch.tensor(x.shape[0], device=x.device, dtype=torch.long).unsqueeze(0)  # (1, 1)
+        tokens = x.permute(1, 0).unsqueeze(0)  # 1, C, B*T
+        embeddings = self.codec.dequantize(tokens=tokens, tokens_len=x_len)  # (B, D, T)
+        # we treat each time step as a separate example
+        embeddings = rearrange(embeddings, 'B D T -> (B T) D')
+        return embeddings


Don't we need some sort of masking here? If we are reducing B and T into one dimension, how do we ensure that no padding gets passed to the model?

If this only works on batch 1, we should add a check that x has a batch dimension of 1 or output an error or a warning

github-actions bot added the TTS label Dec 23, 2025

github-advanced-security bot found potential problems Dec 23, 2025

View reviewed changes

nemo/collections/tts/modules/magpietts_inference/evaluate_generated_audio.py Fixed Show fixed Hide fixed

rfejgin marked this pull request as ready for review December 23, 2025 06:58

rfejgin marked this pull request as draft December 23, 2025 07:11

rfejgin added 5 commits December 22, 2025 23:15

Add metric: Freceht Distance in codec embedding space

db86b81

Signed-off-by: Fejgin, Roy <[email protected]>

Frechet Codec Distance API change

c91dd16

Instead of taking a codec instance, accept a codec name: local path or HF/NGC name. This simplifies the metric's integration in calling code. Signed-off-by: Fejgin, Roy <[email protected]>

Integrate Frechet Codec Distance in inference scripts

85fcb09

Signed-off-by: Fejgin, Roy <[email protected]>

Add a __init__.py package marker to test directory

14a9a27

Signed-off-by: Fejgin, Roy <[email protected]>

Cleanup and add missing files

3fc5f37

* address some CI linting issues * include a file that was missed in last commit Signed-off-by: Fejgin, Roy <[email protected]>

rfejgin force-pushed the magpietts_frechet_codec_distance branch from 8d997ac to 3fc5f37 Compare December 23, 2025 07:15

rfejgin added the Run CICD label Dec 23, 2025

chtruong814 added Run CICD and removed Run CICD labels Dec 23, 2025

chtruong814 had a problem deploying to test December 23, 2025 07:18 — with GitHub Actions Error

Comments and cleanup

78d64ed

Signed-off-by: Fejgin, Roy <[email protected]>

chtruong814 added Run CICD and removed Run CICD labels Dec 23, 2025

rfejgin marked this pull request as ready for review December 23, 2025 18:27

Merge branch 'main' into magpietts_frechet_codec_distance

570a806

rfejgin requested a review from blisc December 23, 2025 18:28

chtruong814 added Run CICD and removed Run CICD labels Dec 23, 2025

rfejgin requested a review from subhankar-ghosh December 23, 2025 18:28

chtruong814 temporarily deployed to test December 23, 2025 18:29 — with GitHub Actions Inactive

blisc requested changes Dec 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TTS] MagpieTTS: Implement Frechet Codec Distance metric + some minor inference bugfixes #15223

[TTS] MagpieTTS: Implement Frechet Codec Distance metric + some minor inference bugfixes #15223

Uh oh!

rfejgin commented Dec 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

blisc Dec 30, 2025

Uh oh!

blisc Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[TTS] MagpieTTS: Implement Frechet Codec Distance metric + some minor inference bugfixes #15223

Are you sure you want to change the base?

[TTS] MagpieTTS: Implement Frechet Codec Distance metric + some minor inference bugfixes #15223

Uh oh!

Conversation

rfejgin commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

Uh oh!

Uh oh!

blisc Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

blisc Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rfejgin commented Dec 23, 2025 •

edited

Loading