Fix Evoformer arch filtering consistency for mixed targets (#7863) by tohtana · Pull Request #7872 · deepspeedai/DeepSpeed

tohtana · 2026-02-24T19:25:41Z

The root cause of #7863 was a compile-target consistency gap: DS_EVOFORMER_GPU_ARCH selected one Evoformer family, while TORCH_CUDA_ARCH_LIST could still emit lower-family slices.
The fix adds shared arch normalization in EvoformerAttnBuilder and uses it in both nvcc_args() (-DGPU_ARCH) and filter_ccs() (-gencode pruning).
For example, with DS_EVOFORMER_GPU_ARCH=80, 7.x targets are now filtered out for Evoformer builds.

This PR also adds unit tests to validate the new behavior.

…ai#7863) Normalize DS_EVOFORMER_GPU_ARCH values and reuse that normalization for both -DGPU_ARCH emission and Evoformer compute-capability filtering. This enforces a single Evoformer kernel-family floor during build and prunes lower cross-family TORCH_CUDA_ARCH_LIST entries. Add unit coverage for normalization/filter behavior and update the Evoformer tutorial to clarify the distinct roles of TORCH_CUDA_ARCH_LIST and DS_EVOFORMER_GPU_ARCH. Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

tohtana requested review from loadams and tjruwase as code owners February 24, 2026 19:25

tohtana mentioned this pull request Feb 24, 2026

Multi-GPU-Arch pre-compilation of operators not supported #7863

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Evoformer arch filtering consistency for mixed targets (#7863)#7872

Fix Evoformer arch filtering consistency for mixed targets (#7863)#7872
tohtana wants to merge 1 commit intodeepspeedai:masterfrom
tohtana:tohtana/fix-issue7863-multi-arch-odr-violation

tohtana commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tohtana commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant