Skip to content

Fix Evoformer arch filtering consistency for mixed targets (#7863)#7872

Open
tohtana wants to merge 1 commit intodeepspeedai:masterfrom
tohtana:tohtana/fix-issue7863-multi-arch-odr-violation
Open

Fix Evoformer arch filtering consistency for mixed targets (#7863)#7872
tohtana wants to merge 1 commit intodeepspeedai:masterfrom
tohtana:tohtana/fix-issue7863-multi-arch-odr-violation

Conversation

@tohtana
Copy link
Collaborator

@tohtana tohtana commented Feb 24, 2026

Fixes #7863

The root cause of #7863 was a compile-target consistency gap: DS_EVOFORMER_GPU_ARCH selected one Evoformer family, while TORCH_CUDA_ARCH_LIST could still emit lower-family slices.
The fix adds shared arch normalization in EvoformerAttnBuilder and uses it in both nvcc_args() (-DGPU_ARCH) and filter_ccs() (-gencode pruning).
For example, with DS_EVOFORMER_GPU_ARCH=80, 7.x targets are now filtered out for Evoformer builds.

This PR also adds unit tests to validate the new behavior.

…ai#7863)

Normalize DS_EVOFORMER_GPU_ARCH values and reuse that normalization for
both -DGPU_ARCH emission and Evoformer compute-capability filtering.
This enforces a single Evoformer kernel-family floor during build and
prunes lower cross-family TORCH_CUDA_ARCH_LIST entries.

Add unit coverage for normalization/filter behavior and update the
Evoformer tutorial to clarify the distinct roles of TORCH_CUDA_ARCH_LIST
and DS_EVOFORMER_GPU_ARCH.

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multi-GPU-Arch pre-compilation of operators not supported

1 participant