Skip to content

Blt Model can't pass training_ci #42629

@3outeille

Description

@3outeille

System Info

  • transformers version: 5.0.0.dev0
  • Platform: Linux-5.15.0-1048-aws-x86_64-with-glibc2.31
  • Python version: 3.12.9
  • Huggingface_hub version: 1.1.5
  • Safetensors version: 0.6.2
  • Accelerate version: 1.11.0
  • Accelerate config: not found
  • DeepSpeed version: 0.18.2
  • PyTorch version (accelerator?): 2.9.0+cu128 (CUDA)
  • Using distributed or parallel set-up in script?:
  • Using GPU in script?:
  • GPU type: NVIDIA H100 80GB HBM3

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

As part of the training_ci (#42597), we noticed that BLT model can't overfit a single sentence (and thus generated the same sentence back). There must be a bug somewhere. Raising issue to flag it

pytest tests/models/blt/test_modeling_blt.py::BltModelTest::test_training_overfit -s

Image

Expected behavior

matching generation and loss + grad_norm reduction should be at least 90%

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions