Add cross-attention to output hypotheses #15229

mgaido91 · 2025-12-24T14:07:28Z

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

The PR adds the encoder-decoder cross-attention to the output hypotheses returned by ASR models.

Collection: ASR

Changelog

Returns the cross-attention scores in the output of the greedy generator
Returns the cross-attention scores in the output of the beam search generator

Usage

You can potentially add a usage example below

from nemo.collections.asr.models import ASRModel
from nemo.collections.asr.models.aed_multitask_models import MultiTaskTranscriptionConfig
model = ASRModel.from_pretrained(model_name="nvidia/canary-1b-v2")
config = MultiTaskTranscriptionConfig(
    batch_size=4,
    return_hypotheses=True,
    num_workers=0,
    verbose=False,
    prompt={'source_lang': 'en', 'target_lang': 'en'},
    enable_chunking=False
)
output = model.transcribe("/Users/mgaido/Downloads/vp-test/aa.wav", override_config=config)
assert output[0].xatt_scores is not None

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

@nithinraok @andrusenkoau

Additional Information

Related to Encoder-decoder attention extraction in ASR transcribe #14961 .

Signed-off-by: Marco Gaido <[email protected]>

…_to_output_hypo

Signed-off-by: mgaido91 <[email protected]>

nithinraok

Thanks Marco. great work. Added comments. Also,
Could you add an option something like preserve_xattn_scores, so when enabled through

decoding_cfg = MultiTaskDecodingConfig(
    strategy="beam",  # or "greedy"
    preserve_xattn_scores=True,
)

only store and return xattn_scores (to save memory by default)

nithinraok · 2026-01-02T19:21:48Z

nemo/collections/asr/parts/utils/rnnt_utils.py


    last_frame (Optional): Index of the last decoding step hypothesis was updated including blank token prediction.
+
+    xatt_scores (Optional): List of cross-attention scores for each decoder layer. Each element of the list


Shouldn;t shape is List[BxHxT1xT2] . Also best to add: this is used with AED models

nithinraok · 2026-01-02T19:22:54Z

nemo/collections/asr/modules/transformer/transformer_generators.py

            )
+            if xatt_scores_list is not None:
+                for layer in range(len(xatt_scores_list)):
+                    xatt_scores_list[layer] = torch.cat((xatt_scores_list[layer], new_xatt_scores_list[layer]), dim=2)


what about condition when new_xattn_scores_list is None? cat would fail

nithinraok · 2026-01-02T19:27:30Z

nemo/collections/asr/modules/transformer/transformer_generators.py

        pos=0,
        return_scores: bool = True,
    ):
        log_probs, decoder_mems_list, _ = super()._one_step_forward(


could you update here as well and also include in returns tuple

nithinraok · 2026-01-02T19:38:06Z

nemo/collections/asr/modules/transformer/transformer_generators.py


+            # select xatt scores corresponding to chosen hypotheses
+            if next_xatt_scores_list is not None:
+                num_heads = xatt_scores_list[0].shape[1]


check for xatt_scores_list if None

nithinraok · 2026-01-02T19:39:37Z

nemo/collections/asr/modules/transformer/transformer_generators.py

-            return prefixes, scores * len_penalties, tgt
+            return prefixes, scores * len_penalties, tgt, xatt_scores_list
        else:
            return tgt


we might also return xatt_scores_list here as return_beam_scores is independent of return xattn_scores.

github-actions bot added the ASR label Dec 24, 2025

Add cross-attention to output hypotheses

21d5bb8

Signed-off-by: Marco Gaido <[email protected]>

mgaido91 force-pushed the add_attention_to_output_hypo branch from 2de6160 to 21d5bb8 Compare December 24, 2025 14:09

mgaido91 and others added 2 commits December 24, 2025 15:16

Merge branch 'main' of github.com:NVIDIA-NeMo/NeMo into add_attention…

bdca576

…_to_output_hypo

Apply isort and black reformatting

8839bd1

Signed-off-by: mgaido91 <[email protected]>

nithinraok added the Run CICD label Jan 2, 2026

nithinraok temporarily deployed to test January 2, 2026 14:19 — with GitHub Actions Inactive

nithinraok requested changes Jan 2, 2026

View reviewed changes

nithinraok requested a review from andrusenkoau January 2, 2026 20:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add cross-attention to output hypotheses #15229

Add cross-attention to output hypotheses #15229

mgaido91 commented Dec 24, 2025

Uh oh!

nithinraok left a comment

Uh oh!

nithinraok Jan 2, 2026

Uh oh!

nithinraok Jan 2, 2026

Uh oh!

nithinraok Jan 2, 2026

Uh oh!

nithinraok Jan 2, 2026

Uh oh!

nithinraok Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		last_frame (Optional): Index of the last decoding step hypothesis was updated including blank token prediction.

		xatt_scores (Optional): List of cross-attention scores for each decoder layer. Each element of the list

Add cross-attention to output hypotheses #15229

Are you sure you want to change the base?

Add cross-attention to output hypotheses #15229

Conversation

mgaido91 commented Dec 24, 2025

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

nithinraok left a comment

Choose a reason for hiding this comment

Uh oh!

nithinraok Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

nithinraok Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

nithinraok Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

nithinraok Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

nithinraok Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants