-
Notifications
You must be signed in to change notification settings - Fork 31.4k
Open
Labels
Description
System Info
transformersversion: 5.0.0.dev0- Platform: Linux-5.15.0-1048-aws-x86_64-with-glibc2.31
- Python version: 3.12.9
- Huggingface_hub version: 1.1.5
- Safetensors version: 0.6.2
- Accelerate version: 1.11.0
- Accelerate config: not found
- DeepSpeed version: 0.18.2
- PyTorch version (accelerator?): 2.9.0+cu128 (CUDA)
- Using distributed or parallel set-up in script?:
- Using GPU in script?:
- GPU type: NVIDIA H100 80GB HBM3
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
As part of the training_ci (#42597), we noticed that BLT model can't overfit a single sentence (and thus generated the same sentence back). There must be a bug somewhere. Raising issue to flag it
pytest tests/models/blt/test_modeling_blt.py::BltModelTest::test_training_overfit -s
Expected behavior
matching generation and loss + grad_norm reduction should be at least 90%